w3resource

NLTK Tokenize: Read a given text through each line and look for sentences

NLTK Tokenize: Exercise-8 with Solution

Write a Python NLTK program that will read a given text through each line and look for sentences. Print each sentence and divide two sentences with “==============”.

Sample Solution:

Python Code-1:

import nltk.data
text = '''
Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))

Sample Output:

Original Tweet:

Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.

Mr. Smith waited for the train.
==============
The train was late.
==============
Mary and Samantha took the bus.
==============
I looked for Mary and
Samantha at the bus station.

Punctuation following sentences is also included by default.

Example:

Python Code-2:

import nltk.data
text = '''
Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))

Output:

Original Tweet:

Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].

Mr. Smith waited for the train.
==============
(The train was late.)
==============
Mary and Samantha took the bus.
==============
I looked for Mary and
Samantha at the bus station [Sector-1].

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to remove Twitter username handles from a given twitter text.
Next: Write a Python NLTK program to find parenthesized expressions in a given string and divides the string into a sequence of substrings.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Share this Tutorial / Exercise on : Facebook and Twitter

Python: Tips of the Day

Get the Key Whose Value Is Maximal in a Dictionary:

>>> model_scores = {'model_a': 100, 'model_z': 198, 'model_t': 150}
>>> # workaround
>>> keys, values = list(model_scores.keys()), list(model_scores.values())
>>> keys[values.index(max(values))]
'model_z'
>>> # one-line
>>> max(model_scores, key=model_scores.get)
'model_z'