Matching / broadcasting behavior

DataFrame has the methods add(), sub(), mul(), div() and related functions radd(), rsub(), … for carrying
out binary operations.For broadcasting behavior, Series input is of primary interest. Using these functions,
you can use to either match on the index or columns via the axis keyword:

import numpy as np
import pandas as pd

df = pd.DataFrame({
       'one': pd.Series(np.random.randn(2), index=['a', 'b']),
       'two': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
       'three': pd.Series(np.random.randn(4), index=['b', 'c', 'd','f'])})

df

row = df.iloc[1]

column = df['two']

df.sub(row, axis='columns')

df.sub(row, axis=1)

df.sub(column, axis='index')

df.sub(column, axis=0)

Furthermore you can align a level of a MultiIndexed DataFrame with a Series.

dfmi = df.copy()

dfmi.index = pd.MultiIndex.from_tuples([(1, 'a'), (1, 'b'),
                                        (1, 'c'), (2, 'a'),
                                       (2, 'f')],
                                    names=['first', 'second'])

dfmi.sub(column, axis=0, level='second')

Series and Index also support the divmod() builtin. This function takes the floor division and modulo operation at
the same time returning a two-tuple of the same type as the left hand side. For example:

s = pd.Series(np.arange(10))

s

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32

div, rem = divmod(s, 3)

div

0    0
1    0
2    0
3    1
4    1
5    1
6    2
7    2
8    2
9    3
dtype: int32

rem

0    0
1    1
2    2
3    0
4    1
5    2
6    0
7    1
8    2
9    0
dtype: int32

idx = pd.Index(np.arange(8))

idx

Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

div, rem = divmod(idx, 3)

div

Int64Index([0, 0, 0, 1, 1, 1, 2, 2], dtype='int64')

rem

Int64Index([0, 1, 2, 0, 1, 2, 0, 1], dtype='int64')

We can also do elementwise divmod():

div, rem = divmod(s, [1, 1, 2, 2, 3, 3, 4, 4, 5, 5,])

div

0    0
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
9    1
dtype: int32

rem

0    0
1    0
2    0
3    1
4    1
5    2
6    2
7    3
8    3
9    4
dtype: int32

Missing data / operations with fill values

In Series and DataFrame, the arithmetic functions have the option of inputting a fill_value, namely a value to
substitute when at most one of the values at a location are missing.For example, when adding two DataFrame objects,
you may wish to treat NaN as 0 unless both DataFrames are missing that value, in which case the result will
be NaN (you can later replace NaN with some other value using fillna if you wish).

df

df2 = pd.DataFrame(np.random.randint(low=8, high=10, size=(5, 5)),
                   columns=['a', 'b', 'c', 'd', 'f'])

df2

df = pd.DataFrame(np.random.randint(low=6, high=8, size=(5, 5)),
                   columns=['a', 'b', 'c', 'd', 'f'])

df

df + df2

df.add(df2, fill_value=0)

Flexible comparisons

Series and DataFrame have the binary comparison methods eq, ne, lt, gt, le, and ge whose behavior is analogous
to the binary arithmetic operations described above:

df.gt(df2)

df2.ne(df)

These operations produce a pandas object of the same type as the left-hand-side input that is of dtype bool.
These boolean objects can be used in indexing operations.

Boolean reductions

You can apply the reductions: empty, any(), all(), and bool() to provide a way to summarize a boolean result.

(df > 0).all()

a    True
b    True
c    True
d    True
f    True
dtype: bool

(df > 0).any()

a    True
b    True
c    True
d    True
f    True
dtype: bool

You can reduce to a final boolean value.

(df > 0).any().any()

True

You can test if a pandas object is empty, via the empty property.

df.empty

False

pd.DataFrame(columns=list('ABC')).empty

True

To evaluate single-element pandas objects in a boolean context, use the method bool():

pd.Series([True]).bool()

True

pd.Series([False]).bool()

False

pd.DataFrame([[True]]).bool()

True

pd.DataFrame([[False]]).bool()

False

Comparing if objects are equivalent

Often you may find that there is more than one way to compute the same result. As a simple example, consider df + df and df 2. To test that these two computations produce the same result, given the tools shown above, you might imagine using (df + df == df 2).all(). But in fact, this expression is False:

df + df == df * 2

(df + df == df * 2).all()

a    True
b    True
c    True
d    True
f    True
dtype: bool

Notice that the boolean DataFrame df + df == df * 2 contains some False values! This is because NaNs
do not compare as equals:

np.nan == np.nan

False

So, NDFrames (such as Series and DataFrames) have an equals() method for testing equality, with NaNs in corresponding
locations treated as equal.

(df + df).equals(df * 2)

True

Note that the Series or DataFrame index needs to be in the same order for equality to be True:

df1 = pd.DataFrame({'col': ['boo', 0, np.nan]})

df2 = pd.DataFrame({'col': [np.nan, 0, 'boo']}, index=[2, 1, 0])

df1.equals(df2)

False

df1.equals(df2.sort_index())

True

Comparing array-like objects

You can conveniently perform element-wise comparisons when comparing a pandas data structure with a scalar value:

pd.Series(['boo', 'far', 'baz']) == 'boo'

0     True
1    False
2    False
dtype: bool

pd.Index(['boo', 'far', 'baz']) == 'boo'

array([ True, False, False])

Pandas also handles element-wise comparisons between different array-like objects of the same length:

pd.Series(['boo', 'far', 'aaz']) == pd.Index(['boo', 'far', 'qux'])

0     True
1     True
2    False
dtype: bool

pd.Series(['boo', 'far', 'aaz']) == np.array(['boo', 'far', 'qux'])

0     True
1     True
2    False
dtype: bool

Trying to compare Index or Series objects of different lengths will raise a ValueError:

pd.Series(['boo', 'far', 'aaz']) == pd.Series(['boo', 'far'])
ValueError: Series lengths must match to compare

pd.Series(['boo', 'far', 'aaz']) == pd.Series(['boo'])
ValueError: Series lengths must match to compare

Note that this is different from the NumPy behavior where a comparison can be broadcast:

np.array([1, 2, 3, 4]) == np.array([3])

Combining overlapping data sets

A problem occasionally arising is the combination of two similar data sets where values in one are preferred
over the other.An example would be two data series representing a particular economic indicator where
one is considered to be of “higher quality”.However, the lower quality series might extend further back in history
or have more complete data coverage.As such, we would like to combine two DataFrame objects where missing values
in one DataFrame are conditionally filled with like-labeled values from the other DataFrame.The function implementing
this operation is combine_first(), which we illustrate:

df1 = pd.DataFrame({'A': [1., np.nan, 4., np.nan],
                    'B': [np.nan, 2., 3., 6.]})

df2 = pd.DataFrame({'A': [1., 2., 4., np.nan, 3.],
                    'B': [np.nan, 3., 4., 8.,5.]})

df1

df2

df1.combine_first(df2)

General DataFrame combine

The combine_first() method above calls the more general DataFrame.combine(). This method takes another
DataFrame and a combiner function, aligns the input DataFrame and then passes the combiner function pairs of
Series (i.e., columns whose names are the same).

So, for instance, to reproduce combine_first() as above:

def combiner(a, b):
    return np.where(pd.isna(a), b, a)

	a	b	c	d	f
0	15	15	16	15	14
1	15	15	14	15	15
2	15	14	16	15	14
3	15	15	14	15	16
4	14	14	16	15	15

	a	b	c	d	f
0	15	15	16	15	14
1	15	15	14	15	15
2	15	14	16	15	14
3	15	15	14	15	16
4	14	14	16	15	15

	A	B
0	1.0	NaN
1	NaN	2.0
2	4.0	3.0
3	NaN	6.0

	A	B
0	1.0	NaN
1	2.0	3.0
2	4.0	4.0
3	NaN	8.0
4	3.0	5.0

	A	B
0	1.0	NaN
1	2.0	2.0
2	4.0	3.0
3	NaN	6.0
4	3.0	5.0

	one	two	three
a	1.218453	-0.350691	NaN
b	-0.542001	-0.419797	-0.201188
c	NaN	-0.285277	-0.299671
d	NaN	NaN	-0.909407
f	NaN	NaN	0.118755

	one	two	three
a	1.760454	0.069106	NaN
b	0.000000	0.000000	0.000000
c	NaN	0.134520	-0.098483
d	NaN	NaN	-0.708219
f	NaN	NaN	0.319943

	one	two	three
a	1.569144	0.0	NaN
b	-0.122204	0.0	0.218609
c	NaN	0.0	-0.014394
d	NaN	NaN	NaN
f	NaN	NaN	NaN

		one	two	three
first	second
1	a	1.569144	0.0	NaN
	b	-0.122204	0.0	0.218609
	c	NaN	0.0	-0.014394
2	a	NaN	NaN	-0.558716
2	f	NaN	NaN	NaN

	a	b	c	d	f
0	False	False	False	False	False
1	False	False	False	False	False
2	False	False	False	False	False
3	False	False	False	False	False
4	False	False	False	False	False

	a	b	c	d	f
0	True	True	True	True	True
1	True	True	True	True	True
2	True	True	True	True	True
3	True	True	True	True	True
4	True	True	True	True	True