Removing duplicate rows in Pandas DataFrame
Python Pandas Numpy: Exercise-18 with Solution
Remove duplicate rows from a Pandas DataFrame.
Sample Solution:
Python Code:
import pandas as pd
# Create a sample DataFrame with duplicate rows
data = {'Name': ['Ross', 'Bob', 'Ross', 'Geoffrey', 'Bob'],
'Age': [25, 30, 25, 22, 30],
'Salary': [50000, 60000, 50000, 45000, 60000]}
df = pd.DataFrame(data)
# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()
# Display the DataFrame without duplicates
print(df_no_duplicates)
Output:
Name Age Salary 0 Ross 25 50000 1 Bob 30 60000 3 Geoffrey 22 45000
Explanation:
In the exerciser above,
- We create a sample DataFrame (df) with columns 'Name', 'Age', and 'Salary'.
- The df.drop_duplicates() method removes duplicate rows from the DataFrame.
- The resulting DataFrame (df_no_duplicates) contains only unique rows.
You can also specify a subset of columns to consider when identifying duplicates using the subset parameter. For example, to remove duplicates based on the 'Name' column:
df_no_duplicates = df.drop_duplicates(subset='Name')
Based on the structure of the DataFrame, adjust the column names and data.
Flowchart:
Python Code Editor:
Previous: Normalizing numerical column in Pandas DataFrame with Min-Max scaling.
Next: Performing element-wise addition in NumPy arrays.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics