w3resource

Pandas Series: str.extract() function

Series-str.extract() function

The str.extract() function is used to extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

Syntax:

Series.str.extract(self, pat, flags=0, expand=True)
Pandas Series: str.extract() function

Parameters:

Name Description Type/Default Value Required / Optional
pat Regular expression pattern with capturing groups. str Required
flags Flags from the re module, e.g. re.IGNORECASE, that modify regular expression matching for things like case, spaces, etc. int
Default Value: 0 (no flags)
Required
expand  If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups. bool
Default Value: True
Required

Returns: DataFrame or Series or Index
A DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used. The dtype of each result column is always object, even when no match is found. If expand=False and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject is an Index).

Example - A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'([ab])(\d)')

Output:

  0	1
0	a	3
1	b	4
2	NaN	NaN

Example - A pattern may contain optional groups:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'([ab])?(\d)')

Output:

  0	1
0	a	3
1	b	4
2	NaN	5
Pandas Series: str.extract() function
Named groups will become column names in the result.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')

Output:

  letter	digit
0	 a	     3
1	 b	     4
2	 NaN	 NaN

Example - A pattern with one group will return a DataFrame with one column if expand=True:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'[ab](\d)', expand=True)

Output:

  0
0	3
1	4
2	NaN

Example - A pattern with one group will return a Series if expand=False:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'[ab](\d)', expand=False)

Output:

0      3
1      4
2    NaN
dtype: object

Previous: Series-str.endswith() function
Next: Series-str.extractall() function



Follow us on Facebook and Twitter for latest update.