Series.str.extract(self, pat, flags=0, expand=True) [source]
Extract capture groups in the regex pat as columns in a DataFrame.
For each subject string in the Series, extract groups from the first match of regular expression pat.
| Parameters: |
|
|---|---|
| Returns: |
|
See also
extractall
A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.
>>> s = pd.Series(['a1', 'b2', 'c3'])
>>> s.str.extract(r'([ab])(\d)')
0 1
0 a 1
1 b 2
2 NaN NaN
A pattern may contain optional groups.
>>> s.str.extract(r'([ab])?(\d)')
0 1
0 a 1
1 b 2
2 NaN 3
Named groups will become column names in the result.
>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)') letter digit 0 a 1 1 b 2 2 NaN NaN
A pattern with one group will return a DataFrame with one column if expand=True.
>>> s.str.extract(r'[ab](\d)', expand=True)
0
0 1
1 2
2 NaN
A pattern with one group will return a Series if expand=False.
>>> s.str.extract(r'[ab](\d)', expand=False) 0 1 1 2 2 NaN dtype: object
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.Series.str.extract.html