w3resource

NLTK Tokenize: Split all punctuation into separate tokens

NLTK Tokenize: Exercise-4 with Solution

Write a Python NLTK program to split all punctuation into separate tokens.

Sample Solution:

Python Code :

from nltk.tokenize import WordPunctTokenizer
text = "Reset your password if you just can't remember your old one."
print("\nOriginal string:")
print(text)
result = WordPunctTokenizer().tokenize(text)
print("\nSplit all punctuation into separate tokens:")
print(result)

Sample Output:

Original string:
Reset your password if you just can't remember your old one.

Split all punctuation into separate tokens:
['Reset', 'your', 'password', 'if', 'you', 'just', 'can', "'", 't', 'remember', 'your', 'old', 'one', '.']

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to create a list of words from a given string.
Next: Write a Python NLTK program to tokenize words, sentence wise.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Share this Tutorial / Exercise on : Facebook and Twitter

Python: Tips of the Day

Get the Key Whose Value Is Maximal in a Dictionary:

>>> model_scores = {'model_a': 100, 'model_z': 198, 'model_t': 150}
>>> # workaround
>>> keys, values = list(model_scores.keys()), list(model_scores.values())
>>> keys[values.index(max(values))]
'model_z'
>>> # one-line
>>> max(model_scores, key=model_scores.get)
'model_z'