NLTK Tokenize: Read a given text through each line and look for sentences

Last update on August 19 2022 21:50:46 (UTC/GMT +8 hours)

NLTK Tokenize: Exercise-8 with Solution

Write a Python NLTK program that will read a given text through each line and look for sentences. Print each sentence and divide two sentences with “==============”.

Sample Solution:

Python Code-1:

import nltk.data
text = '''
Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))

Sample Output:

Original Tweet:

Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.

Mr. Smith waited for the train.
==============
The train was late.
==============
Mary and Samantha took the bus.
==============
I looked for Mary and
Samantha at the bus station.

Punctuation following sentences is also included by default.

Example:

Python Code-2:

import nltk.data
text = '''
Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))

Output:

Original Tweet:

Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].

Mr. Smith waited for the train.
==============
(The train was late.)
==============
Mary and Samantha took the bus.
==============
I looked for Mary and
Samantha at the bus station [Sector-1].

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to remove Twitter username handles from a given twitter text.
Next: Write a Python NLTK program to find parenthesized expressions in a given string and divides the string into a sequence of substrings.