Pandas: Extract the unique sentences from a given column of a given DataFrame
Pandas: String and Regular Expression Exercise-39 with Solution
Write a Pandas program to extract the unique sentences from a given column of a given DataFrame.
Sample Solution:
Python Code :
import pandas as pd
import re as re
df = pd.DataFrame({
'company_code': ['Abcd','EFGF', 'zefsalf', 'sdfslew', 'zekfsdf'],
'date_of_sale': ['12/05/2002','16/02/1999','05/09/1998','12/02/2022','15/09/1997'],
'address': ['9910 Surrey Avenue\n9910 Surrey Avenue','92 N. Bishop Avenue','9910 Golden Star Avenue', '102 Dunbar St.\n102 Dunbar St.', '17 West Livingston Court']
})
print("Original DataFrame:")
print(df)
def find_unique_sentence(str1):
result = re.findall(r'(?sm)(^[^\r\n]+$)(?!.*^\1$)', str1)
return result
df['unique_sentence']=df['address'].apply(lambda st : find_unique_sentence(st))
print("\nExtract unique sentences :")
print(df)
Sample Output:
Original DataFrame: company_code ... address 0 Abcd ... 9910 Surrey Avenue\n9910 Surrey Avenue 1 EFGF ... 92 N. Bishop Avenue 2 zefsalf ... 9910 Golden Star Avenue 3 sdfslew ... 102 Dunbar St.\n102 Dunbar St. 4 zekfsdf ... 17 West Livingston Court [5 rows x 3 columns] Extract unique sentences : company_code ... unique_sentence 0 Abcd ... [9910 Surrey Avenue] 1 EFGF ... [92 N. Bishop Avenue] 2 zefsalf ... [9910 Golden Star Avenue] 3 sdfslew ... [102 Dunbar St.] 4 zekfsdf ... [17 West Livingston Court] [5 rows x 4 columns]
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to extract the sentences where a specific word is present in a given column of a given DataFrame.
Next: Write a Pandas program to extract words starting with capital words from a given column of a given DataFrame.
What is the difficulty level of this exercise?
Test your Python skills with w3resource's quiz
Python: Tips of the Day
Returns True if there are duplicate values in a flat list, False otherwise
Example:
def tips_duplicates(lst): return len(lst) != len(set(lst)) x = [2, 4, 6, 8, 4, 2] y = [1, 3, 5, 7, 9] print(tips_duplicates(x)) print(tips_duplicates(y))
Output:
True False
- New Content published on w3resource:
- Scala Programming Exercises, Practice, Solution
- Python Itertools exercises
- Python Numpy exercises
- Python GeoPy Package exercises
- Python Pandas exercises
- Python nltk exercises
- Python BeautifulSoup exercises
- Form Template
- Composer - PHP Package Manager
- PHPUnit - PHP Testing
- Laravel - PHP Framework
- Angular - JavaScript Framework
- React - JavaScript Library
- Vue - JavaScript Framework
- Jest - JavaScript Testing Framework