w3resource

Pandas: Split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe

Pandas Grouping and Aggregating: Split-Apply-Combine Exercise-27 with Solution

Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.

Test Data:

    salesman_id  sale_jan  sale_feb  sale_mar  sale_apr  sale_may  sale_jun  \
0          5002    150.50    250.50    150.50    150.50    130.50    150.50   
1          5005    270.65    170.65    270.65    270.65    270.65    270.65   
2          5001     65.26     15.26     65.26     95.26     65.26     45.26   
3          5003    110.50    110.50    110.50    210.50    310.50    110.50   
4          5002    948.50    598.50    948.50    948.50    948.50    948.50   
5          5001   2400.60   1400.60   2400.60   2400.60   2400.60   3400.60   
6          5001   1760.00   2760.00   5760.00    760.00    760.00   5760.00   
7          5006   2983.43   1983.43   1983.43   1983.43   1983.43    983.43   
8          5003    480.40   2480.40   2480.40   2480.40   2480.40   2480.40   
9          5002   1250.45    250.45    250.45    250.45    250.45    250.45   
10         5007     75.29     75.29     75.29     75.29     75.29     75.29   
11         5001   1045.60   3045.60   3045.60   3045.60   3045.60   3045.60   
    sale_jul  sale_aug  sale_sep  sale_oct  sale_nov  sale_dec  
0     950.50    150.50    150.50    150.50    150.50    150.50  
1     270.65     70.65    270.65    270.65    270.65     70.65  
2      65.26     65.26     65.26     65.26     95.26     65.26  
3     210.50    110.50    110.50    110.50    110.50    110.50  
4     948.50    948.50    948.50    948.50    948.50    948.50  
5    2400.60    400.60    200.60   2400.60   2400.60   2400.60  
6    5760.00   5760.00   5760.00   5760.00   5760.00   5760.00  
7     983.43   1983.43   1983.43   1983.43   1983.43   1983.43  
8    2480.40   2480.40   2480.40   2480.40   2480.40   2480.40  
9     250.45    250.45    250.45    250.45    250.45    250.45  
10     75.29     75.29     75.29     75.29     75.29     75.29  
11   3045.60   3045.60   3045.60   3045.60   3045.60   3045.60  

Sample Solution:

Python Code :

import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'salesman_id': [5002,5005,5001,5003,5002,5001,5001,5006,5003,5002,5007,5001],
'sale_jan':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 1760, 2983.43, 480.4,  1250.45, 75.29,1045.6],
'sale_feb':[250.5, 170.65, 15.26, 110.5, 598.5, 1400.6, 2760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_mar':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_apr':[150.5, 270.65, 95.26, 210.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_may':[130.5, 270.65, 65.26, 310.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jun':[150.5, 270.65, 45.26, 110.5, 948.5, 3400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jul':[950.5, 270.65, 65.26, 210.5, 948.5, 2400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_aug':[150.5, 70.65,  65.26, 110.5, 948.5, 400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_sep':[150.5, 270.65, 65.26, 110.5, 948.5, 200.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_oct':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_nov':[150.5, 270.65, 95.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6], 
'sale_dec':[150.5, 70.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6]
})
print("Original Orders DataFrame:")
print(df)
print("\Result after group on salesman_id and apply different aggregate functions:")
df = df.groupby('salesman_id').agg(lambda x : x.sum() if x.name in ['sale_jan','sale_feb','sale_mar'] else x.mean())
print(df)

Sample Output:

Original Orders DataFrame:
    salesman_id  sale_jan  sale_feb  sale_mar  sale_apr  sale_may  sale_jun  \
0          5002    150.50    250.50    150.50    150.50    130.50    150.50   
1          5005    270.65    170.65    270.65    270.65    270.65    270.65   
2          5001     65.26     15.26     65.26     95.26     65.26     45.26   
3          5003    110.50    110.50    110.50    210.50    310.50    110.50   
4          5002    948.50    598.50    948.50    948.50    948.50    948.50   
5          5001   2400.60   1400.60   2400.60   2400.60   2400.60   3400.60   
6          5001   1760.00   2760.00   5760.00    760.00    760.00   5760.00   
7          5006   2983.43   1983.43   1983.43   1983.43   1983.43    983.43   
8          5003    480.40   2480.40   2480.40   2480.40   2480.40   2480.40   
9          5002   1250.45    250.45    250.45    250.45    250.45    250.45   
10         5007     75.29     75.29     75.29     75.29     75.29     75.29   
11         5001   1045.60   3045.60   3045.60   3045.60   3045.60   3045.60   

    sale_jul  sale_aug  sale_sep  sale_oct  sale_nov  sale_dec  
0     950.50    150.50    150.50    150.50    150.50    150.50  
1     270.65     70.65    270.65    270.65    270.65     70.65  
2      65.26     65.26     65.26     65.26     95.26     65.26  
3     210.50    110.50    110.50    110.50    110.50    110.50  
4     948.50    948.50    948.50    948.50    948.50    948.50  
5    2400.60    400.60    200.60   2400.60   2400.60   2400.60  
6    5760.00   5760.00   5760.00   5760.00   5760.00   5760.00  
7     983.43   1983.43   1983.43   1983.43   1983.43   1983.43  
8    2480.40   2480.40   2480.40   2480.40   2480.40   2480.40  
9     250.45    250.45    250.45    250.45    250.45    250.45  
10     75.29     75.29     75.29     75.29     75.29     75.29  
11   3045.60   3045.60   3045.60   3045.60   3045.60   3045.60  
\Result after group on salesman_id and apply different aggregate functions:
             sale_jan  sale_feb  sale_mar     sale_apr  sale_may     sale_jun  \
salesman_id                                                                     
5001          5271.46   7221.46  11271.46  1575.365000  1567.865  3062.865000   
5002          2349.45   1099.45   1349.45   449.816667   443.150   449.816667   
5003           590.90   2590.90   2590.90  1345.450000  1395.450  1295.450000   
5005           270.65    170.65    270.65   270.650000   270.650   270.650000   
5006          2983.43   1983.43   1983.43  1983.430000  1983.430   983.430000   
5007            75.29     75.29     75.29    75.290000    75.290    75.290000   

                sale_jul     sale_aug     sale_sep     sale_oct     sale_nov  \
salesman_id                                                                    
5001         2817.865000  2317.865000  2267.865000  2817.865000  2825.365000   
5002          716.483333   449.816667   449.816667   449.816667   449.816667   
5003         1345.450000  1295.450000  1295.450000  1295.450000  1295.450000   
5005          270.650000    70.650000   270.650000   270.650000   270.650000   
5006          983.430000  1983.430000  1983.430000  1983.430000  1983.430000   
5007           75.290000    75.290000    75.290000    75.290000    75.290000   

                sale_dec  
salesman_id               
5001         2817.865000  
5002          449.816667  
5003         1295.450000  
5005           70.650000  
5006         1983.430000  
5007           75.290000  

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to split a given dataset, group by two columns and convert other columns of the dataframe into a dictionary with column header as key.
Next: Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Python: Tips of the Day

Understanding slice notation:

It's pretty simple really:

a[start:stop]  # items start through stop-1
a[start:]      # items start through the rest of the array
a[:stop]       # items from the beginning through stop-1
a[:]           # a copy of the whole array

There is also the step value, which can be used with any of the above:

a[start:stop:step] # start through not past stop, by step

The key point to remember is that the :stop value represents the first value that is not in the selected slice. So, the difference between stop and start is the number of elements selected (if step is 1, the default).

The other feature is that start or stop may be a negative number, which means it counts from the end of the array instead of the beginning. So:

a[-1]    # last item in the array
a[-2:]   # last two items in the array
a[:-2]   # everything except the last two items

Similarly, step may be a negative number:

a[::-1]    # all items in the array, reversed
a[1::-1]   # the first two items, reversed
a[:-3:-1]  # the last two items, reversed
a[-3::-1]  # everything except the last two items, reversed

Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.

Relation to slice() object

The slicing operator [] is actually being used in the above code with a slice() object using the : notation (which is only valid within []), i.e.:

a[start:stop:step]

is equivalent to:

a[slice(start, stop, step)]

Slice objects also behave slightly differently depending on the number of arguments, similarly to range(), i.e. both slice(stop) and slice(start, stop[, step]) are supported. To skip specifying a given argument, one might use None, so that e.g. a[start:] is equivalent to a[slice(start, None)] or a[::-1] is equivalent to a[slice(None, None, -1)].

While the : -based notation is very helpful for simple slicing, the explicit use of slice() objects simplifies the programmatic generation of slicing.

Ref: https://bit.ly/2MHaTp7