Examples

DataFrame.reindex supports two calling conventions

  • (index=index_labels, columns=column_labels, ...)
  • (labels, axis={'index', 'columns'}, ...)
    We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

In [1]:
import numpy as np
import pandas as pd
In [2]:
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
In [3]:
df = pd.DataFrame({
      'http_status': [200,200,404,301],
      'response_time': [0.04, 0.02, 0.07, 1.0]},
       index=index)
df
Out[3]:
http_status response_time
Firefox 200 0.04
Chrome 200 0.02
Safari 404 0.07
Konqueror 301 1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have
corresponding records in the dataframe are assigned NaN.

In [4]:
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'Chrome']
In [5]:
df.reindex(new_index)
Out[5]:
http_status response_time
Safari 404.0 0.07
Iceweasel NaN NaN
Comodo Dragon NaN NaN
Chrome 200.0 0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index
is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill
the NaN values.

In [6]:
df.reindex(new_index, fill_value=0)
Out[6]:
http_status response_time
Safari 404 0.07
Iceweasel 0 0.00
Comodo Dragon 0 0.00
Chrome 200 0.02
In [7]:
df.reindex(new_index, fill_value='missing')
Out[7]:
http_status response_time
Safari 404 0.07
Iceweasel missing missing
Comodo Dragon missing missing
Chrome 200 0.02

We can also reindex the columns.

In [8]:
df.reindex(columns=['http_status', 'user_agent'])
Out[8]:
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
Konqueror 301 NaN

Or we can use “axis-style” keyword arguments

In [9]:
df.reindex(['http_status', 'user_agent'], axis="columns")
Out[9]:
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
Konqueror 301 NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically
increasing index (for example, a sequence of dates).

In [10]:
date_index = pd.date_range('1/1/2019', periods=6, freq='D')
df2 = pd.DataFrame({"prices": [102, 106, np.nan, 100, 90, 88]},
                   index=date_index)
df2
Out[10]:
prices
2019-01-01 102.0
2019-01-02 106.0
2019-01-03 NaN
2019-01-04 100.0
2019-01-05 90.0
2019-01-06 88.0

Suppose we decide to expand the dataframe to cover a wider date range.

In [11]:
date_index2 = pd.date_range('12/29/2018', periods=10, freq='D')
df2.reindex(date_index2)
Out[11]:
prices
2018-12-29 NaN
2018-12-30 NaN
2018-12-31 NaN
2019-01-01 102.0
2019-01-02 106.0
2019-01-03 NaN
2019-01-04 100.0
2019-01-05 90.0
2019-01-06 88.0
2019-01-07 NaN

The index entries that did not have a value in the original data frame (for example, ‘2019-12-29’) are
by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill
as an argument to the method keyword.

In [12]:
df2.reindex(date_index2, method='bfill')
Out[12]:
prices
2018-12-29 102.0
2018-12-30 102.0
2018-12-31 102.0
2019-01-01 102.0
2019-01-02 106.0
2019-01-03 NaN
2019-01-04 100.0
2019-01-05 90.0
2019-01-06 88.0
2019-01-07 NaN

Please note that the NaN value present in the original dataframe (at index value 2019-01-03) will not
be filled by any of the value propagation schemes. This is because filling while reindexing does not look
at dataframe values, but only compares the original and desired indexes. If you do want to fill
in the NaN values present in the original dataframe, use the fillna() method.