w3resource

Pandas: Remove repetitive characters from the specified column of a given DataFrame

Pandas: String and Regular Expression Exercise-32 with Solution

Write a Pandas program to remove repetitive characters from the specified column of a given DataFrame.

Sample Solution:

Python Code :

import pandas as pd
import re as re
pd.set_option('display.max_columns', 10)
df = pd.DataFrame({
    'text_code': ['t0001.','t0002','t0003', 't0004'],
    'text_lang': ['She livedd a long life.', 'How oold is your father?', 'What is tthe problem?','TThhis desk is used by Tom.']
    })
print("Original DataFrame:")
print(df)
def rep_char(str1):
    tchr = str1.group(0)
    if len(tchr) > 1:
        return tchr[0:1] # can change the value here on repetition
def unique_char(rep, sent_text):
    convert = re.sub(r'(\w)\1+', rep, sent_text) 
    return convert
df['normal_text']=df['text_lang'].apply(lambda x : unique_char(rep_char,x))
print("\nRemove repetitive characters:")
print(df)

Sample Output:

Original DataFrame:
  text_code                    text_lang
0    t0001.      She livedd a long life.
1     t0002     How oold is your father?
2     t0003        What is tthe problem?
3     t0004  TThhis desk is used by Tom.

Remove repetitive characters:
  text_code                    text_lang                normal_text
0    t0001.      She livedd a long life.     She lived a long life.
1     t0002     How oold is your father?    How old is your father?
2     t0003        What is tthe problem?       What is the problem?
3     t0004  TThhis desk is used by Tom.  This desk is used by Tom.

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to extract only punctuations from the specified column of a given DataFrame.
Next: Write a Pandas program to extract numbers greater than 940 from the specified column of a given DataFrame.

What is the difficulty level of this exercise?

Test your Python skills with w3resource's quiz



Python: Tips of the Day

Python: Cache results with decorators

There is a great way to cache functions with decorators in Python. Caching will help save time and precious resources when there is an expensive function at hand.

Implementation is easy, just import lru_cache from functools library and decorate your function using @lru_cache.

from functools import lru_cache

@lru_cache(maxsize=None)
def fibo(a):
    if a <= 1:
        return a
    else:
        return fibo(a-1) + fibo(a-2)

for i in range(20):
    print(fibo(i), end="|")

print("\n\n", fibo.cache_info())

Output:

0|1|1|2|3|5|8|13|21|34|55|89|144|233|377|610|987|1597|2584|4181|

 CacheInfo(hits=36, misses=20, maxsize=None, currsize=20)