w3resource

Pandas: Remove the html tags within the specified column of a given DataFrame

Pandas: String and Regular Expression Exercise-41 with Solution

Write a Pandas program to remove the html tags within the specified column of a given DataFrame.

Sample Solution:

Python Code :

import pandas as pd
import re as re
df = pd.DataFrame({
    'company_code': ['Abcd','EFGF', 'zefsalf', 'sdfslew', 'zekfsdf'],
    'date_of_sale': ['12/05/2002','16/02/1999','05/09/1998','12/02/2022','15/09/1997'],
    'address': ['9910 Surrey <b>Avenue</b>','92 N. Bishop Avenue','9910 <br>Golden Star Avenue', '102 Dunbar <i></i>St.', '17 West Livingston Court']
})
print("Original DataFrame:")
print(df)
def remove_tags(string):
    result = re.sub('<.*?>','',string)
    return result
df['with_out_tags']=df['address'].apply(lambda cw : remove_tags(cw))
print("\nSentences without tags':")
print(df)

Sample Output:

Original DataFrame:
  company_code             ...                                   address
0         Abcd             ...                 9910 Surrey Avenue
1         EFGF             ...                       92 N. Bishop Avenue
2      zefsalf             ...               9910 
Golden Star Avenue 3 sdfslew ... 102 Dunbar St. 4 zekfsdf ... 17 West Livingston Court [5 rows x 3 columns] Sentences without tags': company_code ... with_out_tags 0 Abcd ... 9910 Surrey Avenue 1 EFGF ... 92 N. Bishop Avenue 2 zefsalf ... 9910 Golden Star Avenue 3 sdfslew ... 102 Dunbar St. 4 zekfsdf ... 17 West Livingston Court [5 rows x 4 columns]

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to extract words starting with capital words from a given column of a given DataFrame.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

Returns all the elements of a list except the last one

Example:

def tips_initial(lst):
  return lst[0:-1]

print(tips_initial([1, 2, 3, 4]))

Output:

[1, 2, 3]