w3resource

Pandas: Groupby and aggregate over multiple lists

Pandas Grouping and Aggregating: Split-Apply-Combine Exercise-30 with Solution

Write a Pandas program to split the following dataset using group by on first column and aggregate over multiple lists on second column.

Test Data:

  student_id         marks
0       S001  [88, 89, 90]
1       S001  [78, 81, 60]
2       S002  [84, 83, 91]
3       S002  [84, 88, 91]
4       S003  [90, 89, 92]
5       S003  [88, 59, 90]  

Sample Solution:

Python Code :

import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
    'student_id': ['S001','S001','S002','S002','S003','S003'],
    'marks': [[88,89,90],[78,81,60],[84,83,91],[84,88,91],[90,89,92],[88,59,90]]})
print("Original DataFrame:")
print(df)
print("\nGroupby and aggregate over multiple lists:")
result = df.set_index('student_id')['marks'].groupby('student_id').apply(list).apply(lambda x: np.mean(x,0))
print(result)

Sample Output:

Original DataFrame:
  student_id         marks
0       S001  [88, 89, 90]
1       S001  [78, 81, 60]
2       S002  [84, 83, 91]
3       S002  [84, 88, 91]
4       S003  [90, 89, 92]
5       S003  [88, 59, 90]

Groupby and aggregate over multiple lists:
student_id
S001    [83.0, 85.0, 75.0]
S002    [84.0, 85.5, 91.0]
S003    [89.0, 74.0, 91.0]
Name: marks, dtype: object

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to split a given dataset using group by on specified column into two labels and ranges.
Next: Write a Pandas program to split the following dataset using group by on ‘salesman_id’ and find the first order date for each group.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

Returns True if there are duplicate values in a flat list, False otherwise

Example:

def tips_duplicates(lst):
  return len(lst) != len(set(lst))

x = [2, 4, 6, 8, 4, 2]
y = [1, 3, 5, 7, 9]
print(tips_duplicates(x))
print(tips_duplicates(y))

Output:

True
False