w3resource

Pandas Series: reindex() function

Conform series in Pandas

The reindex() function is used to conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.

A new object is produced unless the new index is equivalent to the current one and copy=False.

Syntax:

Series.reindex(self, index=None, **kwargs)
Pandas Series reindex image

Parameters:

Name Description Type/Default Value Required / Optional
index New labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data array-like optional
method Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.
  • None (default): don’t fill gaps
  • pad / ffill: propagate last valid observation forward to next valid
  • .
  • backfill / bfill: use next valid observation to fill gap
  • nearest: use nearest valid observations to fill gap
{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’} Required
copy Return a new object, even if the passed indexes are the same. bool
Default Value: True
Required
level Broadcast across a level, matching Index values on the passed MultiIndex level. int or name Required
fill_value Value to use for missing values. Defaults to NaN, but can be any “compatible” value. scalar
Default Value: np.NaN
Required
limit Maximum number of consecutive elements to forward or backward fill. int
Default Value: None
Required
tolerance Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.
Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.
optional

Returns: Series with changed index.

DataFrame.reindex supports two calling conventions

  • (index=index_labels, columns=column_labels, ...)
  • (labels, axis={'index', 'columns'}, ...) We highly recommend using keyword arguments to clarify your intent.

Example - Create a dataframe with some fictional data:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
df = pd.DataFrame({
      'http_status': [200,200,404,301],
      'response_time': [0.04, 0.02, 0.07, 1.0]},
       index=index)
df

Output:

            http_status	response_time
Firefox	           200	0.04
Chrome	           200	0.02
Safari	           404	0.07
Konqueror	          301	1.00

Example - Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
df = pd.DataFrame({
      'http_status': [200,200,404,301],
      'response_time': [0.04, 0.02, 0.07, 1.0]},
       index=index)
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'Chrome']
df.reindex(new_index)

Output:

            http_status	response_time
Safari	         404.0	0.07
Iceweasel	     NaN	NaN
Comodo Dragon	 NaN	NaN
Chrome	     200.0	0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

Example - Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
df = pd.DataFrame({
      'http_status': [200,200,404,301],
      'response_time': [0.04, 0.02, 0.07, 1.0]},
       index=index)
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'Chrome']
df.reindex(new_index, fill_value=0)

Output:

            http_status	response_time
Safari	           404	0.07
Iceweasel	        0	0.00
Comodo Dragon	    0	0.00
Chrome	       200	0.02

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
df = pd.DataFrame({
      'http_status': [200,200,404,301],
      'response_time': [0.04, 0.02, 0.07, 1.0]},
       index=index)
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'Chrome']
df.reindex(new_index, fill_value='missing')

Output:

            http_status	response_time
Safari	         404	0.07
Iceweasel	     missing	missing
Comodo Dragon	 missing	missing
Chrome	         200      0.02

Example - We can also reindex the columns:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
df = pd.DataFrame({
      'http_status': [200,200,404,301],
      'response_time': [0.04, 0.02, 0.07, 1.0]},
       index=index)
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'Chrome']
df.reindex(columns=['http_status', 'user_agent'])

Output:

            http_status	user_agent
Firefox	         200	NaN
Chrome	         200	NaN
Safari	         404	NaN
Konqueror	        301	NaN

Example - Or we can use “axis-style” keyword arguments:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
df = pd.DataFrame({
      'http_status': [200,200,404,301],
      'response_time': [0.04, 0.02, 0.07, 1.0]},
       index=index)
new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'Chrome']
df.reindex(['http_status', 'user_agent'], axis="columns")

Output:

            http_status	user_agent
Firefox	         200	NaN
Chrome	         200	NaN
Safari	         404	NaN
Konqueror	     301	NaN

Example - To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates):

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
date_index = pd.date_range('1/1/2019', periods=6, freq='D')
df2 = pd.DataFrame({"prices": [102, 106, np.nan, 100, 90, 88]},
                   index=date_index)
df2

Output:

            prices
2019-01-01	102.0
2019-01-02	106.0
2019-01-03	NaN
2019-01-04	100.0
2019-01-05	90.0
2019-01-06	88.0

Example - Suppose we decide to expand the dataframe to cover a wider date range:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
date_index = pd.date_range('1/1/2019', periods=6, freq='D')
df2 = pd.DataFrame({"prices": [102, 106, np.nan, 100, 90, 88]},
                   index=date_index)
date_index2 = pd.date_range('12/29/2018', periods=10, freq='D')
df2.reindex(date_index2)

Output:

           prices
2018-12-29	NaN
2018-12-30	NaN
2018-12-31	NaN
2019-01-01	102.0
2019-01-02	106.0
2019-01-03	NaN
2019-01-04	100.0
2019-01-05	90.0
2019-01-06	88.0
2019-01-07	NaN

The index entries that did not have a value in the original data frame (for example, ‘2019-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

Example - For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = ['Firefox', 'Chrome', 'Safari', 'Konqueror']
date_index = pd.date_range('1/1/2019', periods=6, freq='D')
df2 = pd.DataFrame({"prices": [102, 106, np.nan, 100, 90, 88]},
                   index=date_index)
date_index2 = pd.date_range('12/29/2018', periods=10, freq='D')
df2.reindex(date_index2, method='bfill')

Output:

           prices
2018-12-29	102.0
2018-12-30	102.0
2018-12-31	102.0
2019-01-01	102.0
2019-01-02	106.0
2019-01-03	NaN
2019-01-04	100.0
2019-01-05	90.0
2019-01-06	88.0
2019-01-07	NaN

Please note that the NaN value present in the original dataframe (at index value 2019-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

Previous: Subsetting final periods of time in Pandas series
Next: Object with matching indices as other object in Pandas



Follow us on Facebook and Twitter for latest update.