w3resource

Pandas: Extract the unique sentences from a given column of a given DataFrame

Pandas: String and Regular Expression Exercise-39 with Solution

Write a Pandas program to extract the unique sentences from a given column of a given DataFrame.

Sample Solution:

Python Code :

import pandas as pd
import re as re
df = pd.DataFrame({
    'company_code': ['Abcd','EFGF', 'zefsalf', 'sdfslew', 'zekfsdf'],
    'date_of_sale': ['12/05/2002','16/02/1999','05/09/1998','12/02/2022','15/09/1997'],
    'address': ['9910 Surrey Avenue\n9910 Surrey Avenue','92 N. Bishop Avenue','9910 Golden Star Avenue', '102 Dunbar St.\n102 Dunbar St.', '17 West Livingston Court']
})

print("Original DataFrame:")
print(df)

def find_unique_sentence(str1):
    result = re.findall(r'(?sm)(^[^\r\n]+$)(?!.*^\1$)', str1)
    return result

df['unique_sentence']=df['address'].apply(lambda st : find_unique_sentence(st))
print("\nExtract unique sentences :")
print(df)

Sample Output:

Original DataFrame:
  company_code                   ...                                                   address
0         Abcd                   ...                    9910 Surrey Avenue\n9910 Surrey Avenue
1         EFGF                   ...                                       92 N. Bishop Avenue
2      zefsalf                   ...                                   9910 Golden Star Avenue
3      sdfslew                   ...                            102 Dunbar St.\n102 Dunbar St.
4      zekfsdf                   ...                                  17 West Livingston Court

[5 rows x 3 columns]

Extract unique sentences :
  company_code             ...                         unique_sentence
0         Abcd             ...                    [9910 Surrey Avenue]
1         EFGF             ...                   [92 N. Bishop Avenue]
2      zefsalf             ...               [9910 Golden Star Avenue]
3      sdfslew             ...                        [102 Dunbar St.]
4      zekfsdf             ...              [17 West Livingston Court]

[5 rows x 4 columns]

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to extract the sentences where a specific word is present in a given column of a given DataFrame.
Next: Write a Pandas program to extract words starting with capital words from a given column of a given DataFrame.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

Returns True if there are duplicate values in a flat list, False otherwise

Example:

def tips_duplicates(lst):
  return len(lst) != len(set(lst))

x = [2, 4, 6, 8, 4, 2]
y = [1, 3, 5, 7, 9]
print(tips_duplicates(x))
print(tips_duplicates(y))

Output:

True
False