Examples
Start by creating a series with 9 one minute timestamps.
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series
Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin.
series.resample('3T').sum()
Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left.
series.resample('3T', label='right').sum()
Downsample the series into 3 minute bins as above, but close the right side of the bin interval.
series.resample('3T', label='right', closed='right').sum()
Upsample the series into 30 second bins.
series.resample('30S').asfreq()[0:5] # Select first 5 rows
Upsample the series into 30 second bins and fill the NaN values using the pad method.
series.resample('30S').pad()[0:5]
Upsample the series into 30 second bins and fill the NaN values using the bfill method.
series.resample('30S').bfill()[0:5]
Pass a custom function via apply
def custom_resampler(array_like):
return np.sum(array_like) + 5
series.resample('3T').apply(custom_resampler)
For a Series with a PeriodIndex, the keyword convention can be used to control whether
to use the start or end of rule.
Resample a year by quarter using ‘start’ convention. Values are assigned to the first
quarter of the period.
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
freq='A',
periods=2))
s
s.resample('Q', convention='start').asfreq()
Resample quarters by month using ‘end’ convention. Values are assigned to the last month of the period.
q = pd.Series([2, 3, 4, 5], index=pd.period_range('2019-01-01',
freq='Q',
periods=4))
q
q.resample('M', convention='end').asfreq()
For DataFrame objects, the keyword on can be used to specify the column instead,
of the index for resampling.
d = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
periods=8,
freq='W')
df
df.resample('M', on='week_starting').mean()
For a DataFrame with MultiIndex, the keyword level can be used to specify on which level
the resampling needs to take place.
days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df2 = pd.DataFrame(d2,
index=pd.MultiIndex.from_product([days,
['morning',
'afternoon']]
))
df2
df2.resample('D', level=0).sum()