Pandas Series: str.split() function
Series-str.split() function
The str.split() function is used to split strings around given separator/delimiter.
The function splits the string in the Series/Index from the beginning, at the specified delimiter string. Equivalent to str.split().
Syntax:
Series.str.split(self, pat=None, n=-1, expand=False)
Parameters:
Name | Description | Type/Default Value | Required / Optional |
---|---|---|---|
pat | String or regular expression to split on. If not specified, split on whitespace. | str | Optional |
n | Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits. | int Default Value: 1 (all) |
Required |
expand | Expand the splitted strings into separate columns.
|
bool Default Value: False |
Required |
Returns: Series, Index, DataFrame or MultiIndex
Type matches caller unless expand=True
Example - In the default setting, the string is split by whitespace:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(["this is my new pen",
"https://www.w3resource.com/pandas/index.php",
np.nan])
s.str.split()
Output:
0 [this, is, my, new, pen] 1 [https://www.w3resource.com/pandas/index.php] 2 NaN dtype: object
Example - Without the n parameter, the outputs of rsplit and split are identical:
The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(["this is my new pen",
"https://www.w3resource.com/pandas/index.php",
np.nan])
s.str.split(n=2)
Output:
0 [this, is, my new pen] 1 [https://www.w3resource.com/pandas/index.php] 2 NaN dtype: object
Example - The pat parameter can be used to split by other characters:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(["this is my new pen",
"https://www.w3resource.com/pandas/index.php",
np.nan])
s.str.split(pat = "/")
Output:
0 [this is my new pen] 1 [https:, , www.w3resource.com, pandas, index.php] 2 NaN dtype: object
Example - When using expand=True, the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(["this is my new pen",
"https://www.w3resource.com/pandas/index.php",
np.nan])
s.str.split(expand=True)
Output:
0 1 2 3 4 0 this is my new pen 1 https://www.w3resource.com/pandas/index.php None None None None 2 NaN NaN NaN NaN NaN
Example - Remember to escape special characters when explicitly using regular expressions:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(["this is my new pen",
"https://www.w3resource.com/pandas/index.php",
np.nan])
s = pd.Series(["1+1=2"])
s.str.split(r"\+|=", expand=True)
Output:
0 1 2 0 1 1 2
Previous: Series-str.slice_replace() function
Next: Series-str.rsplit() function
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics