w3resource

Pandas: Extract only words from a given column of a given DataFrame

Pandas: String and Regular Expression Exercise-37 with Solution

Write a Pandas program to extract only words from a given column of a given DataFrame.

Sample Solution:

Python Code :

import pandas as pd
import re as re
df = pd.DataFrame({
    'company_code': ['Abcd','EFGF', 'zefsalf', 'sdfslew', 'zekfsdf'],
    'date_of_sale': ['12/05/2002','16/02/1999','05/09/1998','12/02/2022','15/09/1997'],
    'address': ['9910 Surrey Ave.','92 N. Bishop Ave.','9910 Golden Star Ave.', '102 Dunbar St.', '17 West Livingston Court']
})
print("Original DataFrame:")
print(df)

def search_words(text):
    result = re.findall(r'\b[^\d\W]+\b', text)
    return " ".join(result)

df['only_words']=df['address'].apply(lambda x : search_words(x))
print("\nOnly words:")
print(df)

Sample Output:

Original DataFrame:
  company_code date_of_sale                   address
0         Abcd   12/05/2002          9910 Surrey Ave.
1         EFGF   16/02/1999         92 N. Bishop Ave.
2      zefsalf   05/09/1998     9910 Golden Star Ave.
3      sdfslew   12/02/2022            102 Dunbar St.
4      zekfsdf   15/09/1997  17 West Livingston Court

Only words:
  company_code          ...                       only_words
0         Abcd          ...                       Surrey Ave
1         EFGF          ...                     N Bishop Ave
2      zefsalf          ...                  Golden Star Ave
3      sdfslew          ...                        Dunbar St
4      zekfsdf          ...            West Livingston Court

[5 rows x 4 columns]

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to extract date (format: mm-dd-yyyy) from a given column of a given DataFrame.
Next: Write a Pandas program to extract the sentences where a specific word is present in a given column of a given DataFrame.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

Python: Cache results with decorators

There is a great way to cache functions with decorators in Python. Caching will help save time and precious resources when there is an expensive function at hand.

Implementation is easy, just import lru_cache from functools library and decorate your function using @lru_cache.

from functools import lru_cache

@lru_cache(maxsize=None)
def fibo(a):
    if a <= 1:
        return a
    else:
        return fibo(a-1) + fibo(a-2)

for i in range(20):
    print(fibo(i), end="|")

print("\n\n", fibo.cache_info())

Output:

0|1|1|2|3|5|8|13|21|34|55|89|144|233|377|610|987|1597|2584|4181|

 CacheInfo(hits=36, misses=20, maxsize=None, currsize=20)