Pandas Series: str.extract() function
Series-str.extract() function
The str.extract() function is used to extract capture groups in the regex pat as columns in a DataFrame.
For each subject string in the Series, extract groups from the first match of regular expression pat.
Syntax:
Series.str.extract(self, pat, flags=0, expand=True)
Parameters:
Name | Description | Type/Default Value | Required / Optional |
---|---|---|---|
pat | Regular expression pattern with capturing groups. | str | Required |
flags | Flags from the re module, e.g. re.IGNORECASE, that modify regular expression matching for things like case, spaces, etc. | int Default Value: 0 (no flags) |
Required |
expand | If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups. | bool Default Value: True |
Required |
Returns: DataFrame or Series or Index
A DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used. The dtype of each result column is always object, even when no match is found. If expand=False and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject is an Index).
Example - A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'([ab])(\d)')
Output:
0 1 0 a 3 1 b 4 2 NaN NaN
Example - A pattern may contain optional groups:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'([ab])?(\d)')
Output:
0 1 0 a 3 1 b 4 2 NaN 5
Named groups will become column names in the result.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')
Output:
letter digit 0 a 3 1 b 4 2 NaN NaN
Example - A pattern with one group will return a DataFrame with one column if expand=True:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'[ab](\d)', expand=True)
Output:
0 0 3 1 4 2 NaN
Example - A pattern with one group will return a Series if expand=False:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(['a3', 'b4', 'c5'])
s.str.extract(r'[ab](\d)', expand=False)
Output:
0 3 1 4 2 NaN dtype: object
Previous: Series-str.endswith() function
Next: Series-str.extractall() function
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics