Operations between pandas data structure

Statistics :
Operations in general exclude missing data.

In [1]:
import numpy as np
import pandas as pd
In [2]:
dates = pd.date_range('20190101', periods=8)
In [3]:
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('PQRS'))
In [4]:
df.mean()
Out[4]:
P   -0.098712
Q    0.124697
R    0.026464
S   -0.178634
dtype: float64

Same operation on the other axis:

In [5]:
df.mean(1)
Out[5]:
2019-01-01   -0.543970
2019-01-02    0.026791
2019-01-03    0.450197
2019-01-04   -0.546697
2019-01-05    0.350805
2019-01-06    0.640708
2019-01-07   -0.331409
2019-01-08   -0.298794
Freq: D, dtype: float64

Operating with objects :

In [6]:
s = pd.Series([1, 4, np.nan, 6, 8]).shift(2)
In [7]:
s
Out[7]:
0    NaN
1    NaN
2    1.0
3    4.0
4    NaN
dtype: float64
In [8]:
df.sub(s, axis='index')
Out[8]:
P Q R S
2019-01-01 00:00:00 NaN NaN NaN NaN
2019-01-02 00:00:00 NaN NaN NaN NaN
2019-01-03 00:00:00 NaN NaN NaN NaN
2019-01-04 00:00:00 NaN NaN NaN NaN
2019-01-05 00:00:00 NaN NaN NaN NaN
2019-01-06 00:00:00 NaN NaN NaN NaN
2019-01-07 00:00:00 NaN NaN NaN NaN
2019-01-08 00:00:00 NaN NaN NaN NaN
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN

Apply some functions to the data :

In [9]:
df.apply(np.cumsum)
Out[9]:
P Q R S
2019-01-01 1.174363 -0.683185 0.097085 -2.764142
2019-01-02 -0.123822 0.289842 0.652141 -2.886877
2019-01-03 -0.392006 -0.140020 1.865457 -1.601357
2019-01-04 -1.066158 -0.671864 0.487976 -1.204668
2019-01-05 -0.706718 0.535789 0.807102 -1.687665
2019-01-06 0.248406 1.429797 1.427579 -1.594445
2019-01-07 0.776624 0.521628 0.037445 -1.149995
2019-01-08 -0.789694 0.997574 0.211714 -1.429070
In [10]:
df.apply(lambda x: x.max() - x.min())
Out[10]:
P    2.740680
Q    2.115822
R    2.603450
S    4.049662
dtype: float64

Histogramming :

In [11]:
s = pd.Series(np.random.randint(0, 5, size=8))
In [12]:
s
Out[12]:
0    4
1    3
2    1
3    1
4    1
5    2
6    1
7    2
dtype: int32
In [13]:
s.value_counts()
Out[13]:
1    4
2    2
4    1
3    1
dtype: int64

String Methods :

Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, here is an example:

In [14]:
s = pd.Series(['C', 'D', 'Baca', np.nan, 'CABA', 'dog', 'boy'])
In [15]:
s.str.lower()
Out[15]:
0       c
1       d
2    baca
3     NaN
4    caba
5     dog
6     boy
dtype: object