Pandas: Split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe
Pandas Grouping and Aggregating: Split-Apply-Combine Exercise-27 with Solution
Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.
Test Data:
salesman_id sale_jan sale_feb sale_mar sale_apr sale_may sale_jun \ 0 5002 150.50 250.50 150.50 150.50 130.50 150.50 1 5005 270.65 170.65 270.65 270.65 270.65 270.65 2 5001 65.26 15.26 65.26 95.26 65.26 45.26 3 5003 110.50 110.50 110.50 210.50 310.50 110.50 4 5002 948.50 598.50 948.50 948.50 948.50 948.50 5 5001 2400.60 1400.60 2400.60 2400.60 2400.60 3400.60 6 5001 1760.00 2760.00 5760.00 760.00 760.00 5760.00 7 5006 2983.43 1983.43 1983.43 1983.43 1983.43 983.43 8 5003 480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 5002 1250.45 250.45 250.45 250.45 250.45 250.45 10 5007 75.29 75.29 75.29 75.29 75.29 75.29 11 5001 1045.60 3045.60 3045.60 3045.60 3045.60 3045.60
sale_jul sale_aug sale_sep sale_oct sale_nov sale_dec 0 950.50 150.50 150.50 150.50 150.50 150.50 1 270.65 70.65 270.65 270.65 270.65 70.65 2 65.26 65.26 65.26 65.26 95.26 65.26 3 210.50 110.50 110.50 110.50 110.50 110.50 4 948.50 948.50 948.50 948.50 948.50 948.50 5 2400.60 400.60 200.60 2400.60 2400.60 2400.60 6 5760.00 5760.00 5760.00 5760.00 5760.00 5760.00 7 983.43 1983.43 1983.43 1983.43 1983.43 1983.43 8 2480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 250.45 250.45 250.45 250.45 250.45 250.45 10 75.29 75.29 75.29 75.29 75.29 75.29 11 3045.60 3045.60 3045.60 3045.60 3045.60 3045.60
Sample Solution:
Python Code :
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'salesman_id': [5002,5005,5001,5003,5002,5001,5001,5006,5003,5002,5007,5001],
'sale_jan':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 1760, 2983.43, 480.4, 1250.45, 75.29,1045.6],
'sale_feb':[250.5, 170.65, 15.26, 110.5, 598.5, 1400.6, 2760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_mar':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_apr':[150.5, 270.65, 95.26, 210.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_may':[130.5, 270.65, 65.26, 310.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jun':[150.5, 270.65, 45.26, 110.5, 948.5, 3400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jul':[950.5, 270.65, 65.26, 210.5, 948.5, 2400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_aug':[150.5, 70.65, 65.26, 110.5, 948.5, 400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_sep':[150.5, 270.65, 65.26, 110.5, 948.5, 200.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_oct':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_nov':[150.5, 270.65, 95.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_dec':[150.5, 70.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6]
})
print("Original Orders DataFrame:")
print(df)
print("\Result after group on salesman_id and apply different aggregate functions:")
df = df.groupby('salesman_id').agg(lambda x : x.sum() if x.name in ['sale_jan','sale_feb','sale_mar'] else x.mean())
print(df)
Sample Output:
Original Orders DataFrame: salesman_id sale_jan sale_feb sale_mar sale_apr sale_may sale_jun \ 0 5002 150.50 250.50 150.50 150.50 130.50 150.50 1 5005 270.65 170.65 270.65 270.65 270.65 270.65 2 5001 65.26 15.26 65.26 95.26 65.26 45.26 3 5003 110.50 110.50 110.50 210.50 310.50 110.50 4 5002 948.50 598.50 948.50 948.50 948.50 948.50 5 5001 2400.60 1400.60 2400.60 2400.60 2400.60 3400.60 6 5001 1760.00 2760.00 5760.00 760.00 760.00 5760.00 7 5006 2983.43 1983.43 1983.43 1983.43 1983.43 983.43 8 5003 480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 5002 1250.45 250.45 250.45 250.45 250.45 250.45 10 5007 75.29 75.29 75.29 75.29 75.29 75.29 11 5001 1045.60 3045.60 3045.60 3045.60 3045.60 3045.60 sale_jul sale_aug sale_sep sale_oct sale_nov sale_dec 0 950.50 150.50 150.50 150.50 150.50 150.50 1 270.65 70.65 270.65 270.65 270.65 70.65 2 65.26 65.26 65.26 65.26 95.26 65.26 3 210.50 110.50 110.50 110.50 110.50 110.50 4 948.50 948.50 948.50 948.50 948.50 948.50 5 2400.60 400.60 200.60 2400.60 2400.60 2400.60 6 5760.00 5760.00 5760.00 5760.00 5760.00 5760.00 7 983.43 1983.43 1983.43 1983.43 1983.43 1983.43 8 2480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 250.45 250.45 250.45 250.45 250.45 250.45 10 75.29 75.29 75.29 75.29 75.29 75.29 11 3045.60 3045.60 3045.60 3045.60 3045.60 3045.60 \Result after group on salesman_id and apply different aggregate functions: sale_jan sale_feb sale_mar sale_apr sale_may sale_jun \ salesman_id 5001 5271.46 7221.46 11271.46 1575.365000 1567.865 3062.865000 5002 2349.45 1099.45 1349.45 449.816667 443.150 449.816667 5003 590.90 2590.90 2590.90 1345.450000 1395.450 1295.450000 5005 270.65 170.65 270.65 270.650000 270.650 270.650000 5006 2983.43 1983.43 1983.43 1983.430000 1983.430 983.430000 5007 75.29 75.29 75.29 75.290000 75.290 75.290000 sale_jul sale_aug sale_sep sale_oct sale_nov \ salesman_id 5001 2817.865000 2317.865000 2267.865000 2817.865000 2825.365000 5002 716.483333 449.816667 449.816667 449.816667 449.816667 5003 1345.450000 1295.450000 1295.450000 1295.450000 1295.450000 5005 270.650000 70.650000 270.650000 270.650000 270.650000 5006 983.430000 1983.430000 1983.430000 1983.430000 1983.430000 5007 75.290000 75.290000 75.290000 75.290000 75.290000 sale_dec salesman_id 5001 2817.865000 5002 449.816667 5003 1295.450000 5005 70.650000 5006 1983.430000 5007 75.290000
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to split a given dataset, group by two columns and convert other columns of the dataframe into a dictionary with column header as key.
Next: Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://www.w3resource.com/python-exercises/pandas/groupby/python-pandas-groupby-exercise-27.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics