# Removing duplicate rows in Pandas DataFrame

## Python Pandas Numpy: Exercise-18 with Solution

Remove duplicate rows from a Pandas DataFrame.

Sample Solution:

Python Code:

``````import pandas as pd

# Create a sample DataFrame with duplicate rows
data = {'Name': ['Ross', 'Bob', 'Ross', 'Geoffrey', 'Bob'],
'Age': [25, 30, 25, 22, 30],
'Salary': [50000, 60000, 50000, 45000, 60000]}

df = pd.DataFrame(data)

# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()

# Display the DataFrame without duplicates
print(df_no_duplicates)
```
```

Output:

```       Name  Age  Salary
0      Ross   25   50000
1       Bob   30   60000
3  Geoffrey   22   45000
```

Explanation:

In the exerciser above,

• We create a sample DataFrame (df) with columns 'Name', 'Age', and 'Salary'.
• The df.drop_duplicates() method removes duplicate rows from the DataFrame.
• The resulting DataFrame (df_no_duplicates) contains only unique rows.

You can also specify a subset of columns to consider when identifying duplicates using the subset parameter. For example, to remove duplicates based on the 'Name' column:

df_no_duplicates = df.drop_duplicates(subset='Name')

Based on the structure of the DataFrame, adjust the column names and data.

Flowchart:

