w3resource

NLTK Tokenize: Remove username handles from a twitter text

NLTK Tokenize: Exercise-7 with Solution

Write a Python NLTK program to remove Twitter username handles from a given twitter text.

Sample Solution:

Python Code :

from nltk.tokenize import TweetTokenizer
tknzr = TweetTokenizer(strip_handles=True)
tweet_text = "@abcd @pqrs NoSQL introduction - w3resource http://bit.ly/1ngHC5F  #nosql #database #webdev"
print("\nOriginal Tweet:")
print(tweet_text)
result = tknzr.tokenize(tweet_text)
print("\nTokenize a twitter text:")
print(result)

Sample Output:

Original Tweet:
@abcd @pqrs NoSQL introduction - w3resource http://bit.ly/1ngHC5F  #nosql #database #webdev

Tokenize a twitter text:
['NoSQL', 'introduction', '-', 'w3resource', 'http://bit.ly/1ngHC5F', '#nosql', '#database', '#webdev']

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to tokenize a twitter text.
Next: Write a Python NLTK program that will read a given text through each line and look for sentences. Print each sentence and divide two sentences with “==============”.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Share this Tutorial / Exercise on : Facebook and Twitter

Python: Tips of the Day

Get the Key Whose Value Is Maximal in a Dictionary:

>>> model_scores = {'model_a': 100, 'model_z': 198, 'model_t': 150}
>>> # workaround
>>> keys, values = list(model_scores.keys()), list(model_scores.values())
>>> keys[values.index(max(values))]
'model_z'
>>> # one-line
>>> max(model_scores, key=model_scores.get)
'model_z'