Pandas: Keep only the rows with at least 2 non-NA values
Pandas Handling Missing Values: Exercise-8 with Solution
Write a Pandas program to keep the rows with at least 2 NaN values in a given DataFrame.
Test Data:
ord_no purch_amt ord_date customer_id 0 NaN NaN NaN NaN 1 NaN 270.65 2012-09-10 3001.0 2 70002.0 65.26 NaN 3001.0 3 NaN NaN NaN NaN 4 NaN 948.50 2012-09-10 3002.0 5 70005.0 2400.60 2012-07-27 3001.0 6 NaN 5760.00 2012-09-10 3001.0 7 70010.0 1983.43 2012-10-10 3004.0 8 70003.0 2480.40 2012-10-10 3003.0 9 70012.0 250.45 2012-06-27 3002.0 10 NaN 75.29 2012-08-17 3001.0 11 NaN NaN NaN NaN
Sample Solution:
Python Code :
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'ord_no':[np.nan,np.nan,70002,np.nan,np.nan,70005,np.nan,70010,70003,70012,np.nan,np.nan],
'purch_amt':[np.nan,270.65,65.26,np.nan,948.5,2400.6,5760,1983.43,2480.4,250.45, 75.29,np.nan],
'ord_date': [np.nan,'2012-09-10',np.nan,np.nan,'2012-09-10','2012-07-27','2012-09-10','2012-10-10','2012-10-10','2012-06-27','2012-08-17',np.nan],
'customer_id':[np.nan,3001,3001,np.nan,3002,3001,3001,3004,3003,3002,3001,np.nan]})
print("Original Orders DataFrame:")
print(df)
print("\nKeep the rows with at least 2 NaN values of the said DataFrame:")
result = df.dropna(thresh=2)
print(result)
Sample Output:
Original Orders DataFrame: ord_no purch_amt ord_date customer_id 0 NaN NaN NaN NaN 1 NaN 270.65 2012-09-10 3001.0 2 70002.0 65.26 NaN 3001.0 3 NaN NaN NaN NaN 4 NaN 948.50 2012-09-10 3002.0 5 70005.0 2400.60 2012-07-27 3001.0 6 NaN 5760.00 2012-09-10 3001.0 7 70010.0 1983.43 2012-10-10 3004.0 8 70003.0 2480.40 2012-10-10 3003.0 9 70012.0 250.45 2012-06-27 3002.0 10 NaN 75.29 2012-08-17 3001.0 11 NaN NaN NaN NaN Keep the rows with at least 2 NaN values of the said DataFrame: ord_no purch_amt ord_date customer_id 1 NaN 270.65 2012-09-10 3001.0 2 70002.0 65.26 NaN 3001.0 4 NaN 948.50 2012-09-10 3002.0 5 70005.0 2400.60 2012-07-27 3001.0 6 NaN 5760.00 2012-09-10 3001.0 7 70010.0 1983.43 2012-10-10 3004.0 8 70003.0 2480.40 2012-10-10 3003.0 9 70012.0 250.45 2012-06-27 3002.0 10 NaN 75.29 2012-08-17 3001.0
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to drop the rows where all elements are missing in a given DataFrame.
Next: Write a Pandas program to drop those rows from a given DataFrame in which specific columns have missing values.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
Python: Tips of the Day
Creates a list of elements, grouped based on the position in the original lists:
Example:
def tips_zip(*args, fill_value=None): max_length = max([len(lst) for lst in args]) result = [] for i in range(max_length): result.append([ args[k][i] if i < len(args[k]) else fillvalue for k in range(len(args)) ]) return result print(tips_zip(['a', 'b'], [1, 2], [True, False])) print(tips_zip(['a'], [1, 2], [True, False])) print(tips_zip(['a'], [1, 2], [True, False], fill_value = '1'))
Output:
[['a', 1, True], ['b', 2, False]] [['a', 1, True], [None, 2, False]] [['a', 1, True], ['1', 2, False]]
- Weekly Trends
- Java Basic Programming Exercises
- SQL Subqueries
- Adventureworks Database Exercises
- C# Sharp Basic Exercises
- SQL COUNT() with distinct
- JavaScript String Exercises
- JavaScript HTML Form Validation
- Java Collection Exercises
- SQL COUNT() function
- SQL Inner Join
- JavaScript functions Exercises
- Python Tutorial
- Python Array Exercises
- SQL Cross Join
- C# Sharp Array Exercises