NLTK corpus: Find the number of male and female names in the names corpus

NLTK corpus: Exercise-11 with Solution

Write a Python NLTK program to find the number of male and female names in the names corpus. Print the first 10 male and female names.

Note: The names corpus contains a total of around 2943 male (male.txt) and 5001 female (female.txt) names. It’s compiled by Kantrowitz, Ross.

Sample Solution:

Python Code :

from nltk.corpus import names 
print("\nNumber of male names:")
print (len(names.words('male.txt')))
print("\nNumber of female names:")
print (len(names.words('female.txt')))
male_names = names.words('male.txt')
female_names = names.words('female.txt')
print("\nFirst 10 male names:")
print (male_names[0:15])
print("\nFirst 10 female names:")
print (female_names[0:15])

Sample Output:

Number of male names:

Number of female names:

First 10 male names:
['Aamir', 'Aaron', 'Abbey', 'Abbie', 'Abbot', 'Abbott', 'Abby', 'Abdel', 'Abdul', 'Abdulkarim', 'Abdullah', 'Abe', 'Abel', 'Abelard', 'Abner']

First 10 female names:
['Abagael', 'Abagail', 'Abbe', 'Abbey', 'Abbi', 'Abbie', 'Abby', 'Abigael', 'Abigail', 'Abigale', 'Abra', 'Acacia', 'Ada', 'Adah', 'Adaline']

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to compare the similarity of two given verbs.
Next: Write a Python NLTK program to print the first 15 random combine labeled male and labeled female names from names corpus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.

Share this Tutorial / Exercise on : Facebook and Twitter

Python: Tips of the Day

Get the Key Whose Value Is Maximal in a Dictionary:

>>> model_scores = {'model_a': 100, 'model_z': 198, 'model_t': 150}
>>> # workaround
>>> keys, values = list(model_scores.keys()), list(model_scores.values())
>>> keys[values.index(max(values))]
>>> # one-line
>>> max(model_scores, key=model_scores.get)