w3resource

NumPy: Split a given text into lines and split the single line into array values


22. Split Text into Lines and Tokens

Write a NumPy program to split a given text into lines and split the single line into array values.

Sample text:
01 V Debby Pramod
02 V Artemiy Ellie
03 V Baptist Kamal
04 V Lavanya Davide
05 V Fulton Antwan
06 V Euanthe Sandeep
07 V Endzela Sanda
08 V Victoire Waman
09 V Briar Nur
10 V Rose Lykos

Sample Solution:

Python Code:

# Importing necessary library
import numpy as np 

# The string containing student information
student = """01	V	Debby Pramod
02	V	Artemiy Ellie
03	V	Baptist Kamal
04	V	Lavanya Davide
05	V	Fulton Antwan
06	V	Euanthe Sandeep
07	V	Endzela Sanda
08	V	Victoire Waman
09	V	Briar Nur
10	V	Rose Lykos"""

# Displaying the original text
print("Original text:") 
print(student)

# Splitting the text into lines and then further splitting by tab ('\t')
text_lines = student.splitlines()
text_lines = [r.split('\t') for r in text_lines]

# Creating a NumPy array from the split text
result = np.array(text_lines, dtype=np.str)
print("\nArray from the said text:")
print(result) 

Sample Output:

Original text:
01	V	Debby Pramod
02	V	Artemiy Ellie
03	V	Baptist Kamal
04	V	Lavanya Davide
05	V	Fulton Antwan
06	V	Euanthe Sandeep
07	V	Endzela Sanda
08	V	Victoire Waman
09	V	Briar Nur
10	V	Rose Lykos

Array from the said text:
[['01' 'V' 'Debby Pramod']
 ['02' 'V' 'Artemiy Ellie']
 ['03' 'V' 'Baptist Kamal']
 ['04' 'V' 'Lavanya Davide']
 ['05' 'V' 'Fulton Antwan']
 ['06' 'V' 'Euanthe Sandeep']
 ['07' 'V' 'Endzela Sanda']
 ['08' 'V' 'Victoire Waman']
 ['09' 'V' 'Briar Nur']
 ['10' 'V' 'Rose Lykos']]

Explanation:

In the above exercise –

student = """01 V Debby Pramod\n02 V Artemiy Ellie\n03 V Baptist Kamal\n04 V Lavanya Davide\n05 V Fulton Antwan\n06 V Euanthe Sandeep\n07 V Endzela Sanda\n08 V Victoire Waman\n09 V Briar Nur\n10 V Rose Lykos""": This creates a string variable student that contains the information for 10 students. Each line in the string represents a student, and the fields are separated by tabs.

text_lines = student.splitlines(): This line splits the student string into individual lines.

text_lines = [r.split('\t') for r in text_lines]: This line splits each line in text_lines into fields separated by tabs, creating a list of lists.

result = np.array(text_lines, dtype=np.str): This line converts the list of lists into a NumPy array, where each row represents a student and each column represents a field (e.g. ID, class, name). The dtype=np.str parameter sets the data type of the array to string.


For more Practice: Solve these Related Problems:

  • Create a function that splits a multiline text into an array of lines using np.char.splitlines, then splits each line into tokens.
  • Implement a solution that reads a large text block, splits it by newline characters, and then by whitespace to form a 2D array.
  • Test the function on text with inconsistent spacing and line breaks to ensure robust splitting.
  • Combine the splitting with a transformation that capitalizes the first word of each line and outputs the modified array.

Go to:


Previous: Write a NumPy program to count a given word in each row of a given array of string values.
Next: NumPy Broadcasting Exercises Home.

Python-Numpy Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.