w3resource

NLTK Tokenize: Exercises with Solution

Python NLTK Tokenize [9 exercises with solution]

What is Tokenize?

Tokenization is the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. The process can be considered a sub-task of parsing input.

1. Write a Python NLTK program to split the text sentence/paragraph into a list of words.
Click me to see the sample solution

2. Write a Python NLTK program to tokenize sentences in languages other than English.
Click me to see the sample solution

3. Write a Python NLTK program to create a list of words from a given string.
Click me to see the sample solution

4. Write a Python NLTK program to split all punctuation into separate tokens.
Click me to see the sample solution

5. Write a Python NLTK program to tokenize words, sentence wise.
Click me to see the sample solution

6. Write a Python NLTK program to tokenize a twitter text.
Click me to see the sample solution

7. Write a Python NLTK program to remove Twitter username handles from a given twitter text.
Click me to see the sample solution

8. Write a Python NLTK program that will read a given text through each line and look for sentences. Print each sentence and divide two sentences with "==============".
Click me to see the sample solution

9. Write a Python NLTK program to find parenthesized expressions in a given string and divides the string into a sequence of substrings.
Click me to see the sample solution

 

More to Come !

Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.

[ Want to contribute to Python - NLTK exercises? Send your code (attached with a .zip file) to us at w3resource[at]yahoo[dot]com. Please avoid copyrighted materials.]



Follow us on Facebook and Twitter for latest update.

Python: Tips of the Day

Kwargs:

**kwargs and *args are function arguments that can be very useful.

They are quite underused and often under-understood as well.

Let's try to explain what kwargs are and how to use them.

  • While *args are used to pass arguments at an unknown amount to functions, **kwargs are used to do the same but with named arguments.
  • So, if *args is a list being passed as an argument, you can think of **kwargs as a dictionary that's being passed as an argument to a function.
  • You can use arguments as you wish as long as you follow the correct order which is: arg1, arg2, *args, **kwargs. It's okay to use only one of those but you can't mix the order, for instance, you can't have: function(**kwargs, arg1), that'd be a major faux pas in Python.
  • Another example: You can do function(*args,**kwargs) since it follows the correct order.
  • Here is an example. Let's say satelites are given with their names and weight in tons in dictionary format. Code prints their weight as kilograms along with their names.
def payloads(**kwargs):
    for key, value in kwargs.items():
        print( key+" |||", float(value)*100)
payloads(NavSat1 = '2.5', BaysatG2 = '4')

Output:

NavSat1 ||| 250.0
BaysatG2 ||| 400.0

Since the function above would work for any number of dictionary keys, **kwargs makes perfect sense rather than passing arguments with a fixed amount.

def payloads(**kwargs):
    for key, value in kwargs.items():
        print( key+" |||", float(value)*100)

sats={"Tx211":"3", "V1":"0.50"}
payloads(**sats)

Output:

Tx211 ||| 300.0
V1 ||| 50.0