Python Scikit-learn: K Nearest Neighbors - Convert Species columns in a numerical column of the iris dataframe
Python Machine learning K Nearest Neighbors: Exercise-3 with Solution
Write a Python program using Scikit-learn to convert Species columns in a numerical column of the iris dataframe. To encode this data map convert each value to a number. e.g. Iris-setosa:0, Iris-versicolor:1, and Iris-virginica:2. Now print the iris dataset into 80% train data and 20% test data. Out of total 150 records, the training set will contain 120 records and the test set contains 30 of those records. Print both datasets.
Sample Solution:
Python Code:
import pandas as pd
from sklearn.model_selection import train_test_split
iris = pd.read_csv("iris.csv")
# Import LabelEncoder
from sklearn import preprocessing
#creating labelEncoder
le = preprocessing.LabelEncoder()
# Converting string labels into numbers.
iris.Species = le.fit_transform(iris.Species)
#Drop id column
iris = iris.drop('Id',axis=1)
X = iris.iloc[:, :-1].values
y = iris.iloc[:, 4].values
#Split arrays or matrices into random train and test subsets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
print("\n80% train data:")
print(X_train)
print(y_train)
print("\n20% test data:")
print(X_test)
print(y_test)
Sample Output:
80% train data: [[5.7 2.9 4.2 1.3] [7.1 3. 5.9 2.1] [5. 3.3 1.4 0.2] [4.6 3.4 1.4 0.3] [5.4 3.4 1.5 0.4] [5.7 3.8 1.7 0.3] [5.7 3. 4.2 1.2] [6.4 2.8 5.6 2.1] [5.8 2.7 3.9 1.2] [4.5 2.3 1.3 0.3] [4.9 2.5 4.5 1.7] [6.1 3. 4.6 1.4] [6.2 3.4 5.4 2.3] [4.9 2.4 3.3 1. ] [5.4 3. 4.5 1.5] [7.4 2.8 6.1 1.9] [6.4 3.1 5.5 1.8] [5.3 3.7 1.5 0.2] [5.6 2.8 4.9 2. ] [6.7 3.3 5.7 2.1] [6.3 2.5 5. 1.9] [4.9 3. 1.4 0.2] [5.2 2.7 3.9 1.4] [5.8 2.8 5.1 2.4] [5. 3. 1.6 0.2] [6. 3. 4.8 1.8] [5.9 3. 4.2 1.5] [4.8 3. 1.4 0.1] [4.8 3.1 1.6 0.2] [5.1 3.3 1.7 0.5] [7.9 3.8 6.4 2. ] [5.7 2.8 4.5 1.3] [6.6 3. 4.4 1.4] [6.2 2.2 4.5 1.5] [6.5 3. 5.8 2.2] [4.3 3. 1.1 0.1] [6. 2.2 4. 1. ] [5.1 3.8 1.9 0.4] [6.1 3. 4.9 1.8] [5.1 3.8 1.5 0.3] [5. 3.6 1.4 0.2] [6. 3.4 4.5 1.6] [5. 3.5 1.3 0.3] [5.7 2.8 4.1 1.3] [5.7 2.6 3.5 1. ] [6.4 2.8 5.6 2.2] [4.7 3.2 1.3 0.2] [5.1 3.8 1.6 0.2] [6.3 2.7 4.9 1.8] [6.9 3.1 5.1 2.3] [5. 2. 3.5 1. ] [5.4 3.4 1.7 0.2] [5.9 3.2 4.8 1.8] [6.5 3. 5.2 2. ] [6.3 2.3 4.4 1.3] [5.1 3.5 1.4 0.3] [6.7 3. 5. 1.7] [5.4 3.7 1.5 0.2] [5.8 2.7 5.1 1.9] [5.7 2.5 5. 2. ] [5.2 4.1 1.5 0.1] [6.9 3.1 5.4 2.1] [5.8 2.7 4.1 1. ] [6.4 3.2 5.3 2.3] [4.6 3.2 1.4 0.2] [5.1 2.5 3. 1.1] [6.7 3.1 5.6 2.4] [5.6 2.9 3.6 1.3] [6.3 3.4 5.6 2.4] [5.8 2.6 4. 1.2] [6.3 2.9 5.6 1.8] [5.2 3.5 1.5 0.2] [6.1 2.8 4.7 1.2] [6.9 3.2 5.7 2.3] [5. 3.4 1.5 0.2] [5.5 2.4 3.7 1. ] [6. 2.9 4.5 1.5] [4.9 3.1 1.5 0.1] [5.5 2.3 4. 1.3] [6.9 3.1 4.9 1.5] [7.7 2.6 6.9 2.3] [5.8 2.7 5.1 1.9] [6.7 3. 5.2 2.3] [5. 3.2 1.2 0.2] [6.7 3.1 4.7 1.5] [5.1 3.7 1.5 0.4] [7.2 3.6 6.1 2.5] [5.6 3. 4.1 1.3] [7.7 3.8 6.7 2.2] [5.5 2.5 4. 1.3] [4.7 3.2 1.6 0.2] [6. 2.7 5.1 1.6] [5.6 3. 4.5 1.5] [5.5 2.6 4.4 1.2] [6. 2.2 5. 1.5] [6.2 2.9 4.3 1.3] [5.4 3.9 1.3 0.4] [6.4 3.2 4.5 1.5] [6.5 3. 5.5 1.8] [5.5 3.5 1.3 0.2] [6.7 3.3 5.7 2.5] [6.7 3.1 4.4 1.4] [6.3 2.5 4.9 1.5] [5. 3.5 1.6 0.6] [6.3 2.8 5.1 1.5] [5. 2.3 3.3 1. ] [4.9 3.1 1.5 0.1] [7.2 3. 5.8 1.6] [7.2 3.2 6. 1.8] [6.4 2.9 4.3 1.3] [5.6 2.7 4.2 1.3] [5.2 3.4 1.4 0.2] [6.5 3.2 5.1 2. ] [6.6 2.9 4.6 1.3] [7. 3.2 4.7 1.4] [5.9 3. 5.1 1.8] [4.6 3.1 1.5 0.2] [4.6 3.6 1. 0.2] [5.4 3.9 1.7 0.4] [4.4 2.9 1.4 0.2]] [1 2 0 0 0 0 1 2 1 0 2 1 2 1 1 2 2 0 2 2 2 0 1 2 0 2 1 0 0 0 2 1 1 1 2 0 1 0 2 0 0 1 0 1 1 2 0 0 2 2 1 0 1 2 1 0 1 0 2 2 0 2 1 2 0 1 2 1 2 1 2 0 1 2 0 1 1 0 1 1 2 2 2 0 1 0 2 1 2 1 0 1 1 1 2 1 0 1 2 0 2 1 1 0 2 1 0 2 2 1 1 0 2 1 1 2 0 0 0 0] 20% test data: [[4.4 3. 1.3 0.2] [6.2 2.8 4.8 1.8] [6.1 2.6 5.6 1.4] [5. 3.4 1.6 0.4] [6.8 3.2 5.9 2.3] [4.8 3. 1.4 0.3] [7.7 3. 6.1 2.3] [6.8 2.8 4.8 1.4] [6.3 3.3 6. 2.5] [6.1 2.9 4.7 1.4] [6.7 2.5 5.8 1.8] [7.6 3. 6.6 2.1] [6.1 2.8 4. 1.3] [5.1 3.4 1.5 0.2] [6.8 3. 5.5 2.1] [6.4 2.7 5.3 1.9] [5.7 4.4 1.5 0.4] [6.3 3.3 4.7 1.6] [4.8 3.4 1.9 0.2] [4.9 3.1 1.5 0.1] [4.4 3.2 1.3 0.2] [5.1 3.5 1.4 0.2] [5.6 2.5 3.9 1.1] [4.8 3.4 1.6 0.2] [7.7 2.8 6.7 2. ] [5.5 2.4 3.8 1.1] [6.5 2.8 4.6 1.5] [7.3 2.9 6.3 1.8] [5.8 4. 1.2 0.2] [5.5 4.2 1.4 0.2]] [0 2 2 0 2 0 2 1 2 1 2 2 1 0 2 2 0 1 0 0 0 0 1 0 2 1 1 2 0 0]
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Python program using Scikit-learn to split the iris dataset into 70% train data and 30% test data. Out of total 150 records, the training set will contain 120 records and the test set contains 30 of those records. Print both datasets.
Next: Write a Python program using Scikit-learn to split the iris dataset into 70% train data and 30% test data. Out of total 150 records, the training set will contain 105 records and the test set contains 45 of those records. Predict the response for test dataset (SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm) using the K Nearest Neighbor Algorithm. Use 5 as number of neighbors.
What is the difficulty level of this exercise?
- Weekly Trends
- Java Basic Programming Exercises
- SQL Subqueries
- Adventureworks Database Exercises
- C# Sharp Basic Exercises
- SQL COUNT() with distinct
- JavaScript String Exercises
- JavaScript HTML Form Validation
- Java Collection Exercises
- SQL COUNT() function
- SQL Inner Join
- JavaScript functions Exercises
- Python Tutorial
- Python Array Exercises
- SQL Cross Join
- C# Sharp Array Exercises
We are closing our Disqus commenting system for some maintenanace issues. You may write to us at reach[at]yahoo[dot]com or visit us at Facebook