w3resource

Pandas: Split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available

Pandas Grouping and Aggregating: Split-Apply-Combine Exercise-28 with Solution

Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.

Test Data:

   school class            name date_Of_Birth   age  height   weight  address
S1   s001     V  Alberto Franco     15/05/2002   12    173      35  street1
S2   s002     V    Gino Mcneill     17/05/2002   12    192      32  street2
S3   s003    VI     Ryan Parkes     16/02/1999   13    186      33  street3
S4   s001    VI    Eesha Hinton     25/09/1998   13    167      30  street1
S5   s002     V    Gino Mcneill     11/05/2002   14    151      31  street2
S6   s004    VI    David Parkes     15/09/1997   12    159      32  street4   

Sample Solution:

Python Code :

import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
    'school_code': ['s001','s002','s003','s001','s002','s004'],
    'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
    'date_Of_Birth ': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'age': [12, 12, 13, 13, 14, 12],
    'weight': [173, 192, 186, 167, 151, 159],
    'height': [35, None, 33, 30, None, 32]},
    index=['S1', 'S2', 'S3', 'S4', 'S5', 'S6'])
print("Original DataFrame:")
print(df)
print("\nGroup by one column and remove those groups if all the values of a specific columns are not available:")
result = df[(~df['height'].isna()).groupby(df['school_code']).transform('any')]
print(result)

Sample Output:

Original DataFrame:
   school_code class            name date_Of_Birth   age  weight  height
S1        s001     V  Alberto Franco     15/05/2002   12     173    35.0
S2        s002     V    Gino Mcneill     17/05/2002   12     192     NaN
S3        s003    VI     Ryan Parkes     16/02/1999   13     186    33.0
S4        s001    VI    Eesha Hinton     25/09/1998   13     167    30.0
S5        s002     V    Gino Mcneill     11/05/2002   14     151     NaN
S6        s004    VI    David Parkes     15/09/1997   12     159    32.0

Group by one column and remove those groups if all the values of a specific columns are not available:
   school_code class            name date_Of_Birth   age  weight  height
S1        s001     V  Alberto Franco     15/05/2002   12     173    35.0
S3        s003    VI     Ryan Parkes     16/02/1999   13     186    33.0
S4        s001    VI    Eesha Hinton     25/09/1998   13     167    30.0
S6        s004    VI    David Parkes     15/09/1997   12     159    32.0

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.
Next: Write a Pandas program to split a given dataset using group by on specified column into two labels and ranges.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

Negative Indexing:

In Python you can use negative indexing. While positive index starts with 0, negative index starts with -1.

name="Welcome"
print(name[0])
print(name[-1])
print(name[0:3])
print(name[-1:-4:-1])

Output:

W
e
Wel
emo