NLTK Tokenize: Read a given text through each line and look for sentences
NLTK Tokenize: Exercise-8 with Solution
Write a Python NLTK program that will read a given text through each line and look for sentences. Print each sentence and divide two sentences with “==============”.
Sample Solution:
Python Code-1:
import nltk.data
text = '''
Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))
Sample Output:
Original Tweet: Mr. Smith waited for the train. The train was late. Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station. Mr. Smith waited for the train. ============== The train was late. ============== Mary and Samantha took the bus. ============== I looked for Mary and Samantha at the bus station.
Punctuation following sentences is also included by default.
Example:
Python Code-2:
import nltk.data
text = '''
Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))
Output:
Original Tweet: Mr. Smith waited for the train. (The train was late.) Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station [Sector-1]. Mr. Smith waited for the train. ============== (The train was late.) ============== Mary and Samantha took the bus. ============== I looked for Mary and Samantha at the bus station [Sector-1].
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Python NLTK program to remove Twitter username handles from a given twitter text.
Next: Write a Python NLTK program to find parenthesized expressions in a given string and divides the string into a sequence of substrings.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://www.w3resource.com/python-exercises/nltk/nltk-tokenize-exercise-8.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics