2. K Nearest Neighbor – As a Supervised Classification
Approach
• KNN is a non-parametric supervised learning technique in
which the query instance is classified to a given category with
the help of training set.
• Non-parametric means not making any assumptions on the
underlying data distribution.
• Predictions are made for a new instance (x) by searching
through the entire training set for the K most similar cases
(neighbors) and summarizing the output variable for those K
cases.
• In simple words, it captures information of all training cases
and classifies new cases based on a similarity.
2
6. 6
1. Select a value for k (e.g.: 1, 2, 3, 10..)
2. Calculate the Euclidian distance between the point to be classified and every
other point in the training data-set
3. Pick the k closest data points (points with the k smallest distances)
4. Run a majority vote among selected data points, the dominating
classification is the winner! Point is classified based on the dominant class.
5. Repeat if required!
K NEAREST NEIGHBORS (KNN): ALGORITHM STEPS
8. Solution with K=3
Feature 1 Feature 2 Class Euclidean Distance Rank
1 1A 10.81665383 8
2 3A 8.94427191 7
2 4A 8.544003745 6
5 3A 6.403124237 5
8 6B 2.236067977 4
8 8B 2.236067977 3
9 6B 1.414213562 2
11 7B 1 1
Predict for 10 7 Predicted Class = B
8
1. Calculate the Euclidian distance between the point to be classified and every other point in
the training data-set
2. Pick the k=3 closest data points (points with the k smallest distances)
3. Run a majority vote among selected data points, the dominating classification is the winner!
Point is classified based on the dominant class.
9. How to decide the number of neighbors in KNN?
What are its effects on the classification Accuracy?
The number of neighbors(K) in KNN is a hyperparameter that is
needed to be chosen at the time of model building.
• K controls the classification accuracy of the model.
• Generally, K is chosen as an odd number if the number of
classes is even.
• Otherwise value of K is dependent upon the nature of dataset
(Domain Dependent ) for which it is to be applied.
9
10. KNN Implementation in Python
• First, import the KNeighborsClassifier module and create KNN classifier object by passing argument
number of neighbors in KNeighborsClassifier() function.
• by usingthe sklearn.neighbors.NearestNeighbors module
classifier = KNeighborsClassifier(n_neighbors = 5,
metric = 'minkowski', p = 2)
# KNN model with 5 neighbours and Euclidian distance as similarity metric
• Then, fit your model on the train set using fit() and perform prediction on the test set using predict().
# Fitting K-NN to the Training set
• classifier.fit(X_train, y_train)
# Predicting the Test set results
• y_predy_pred = classifier.predict(X_test) 10
from sklearn.neighbors import NearestNeighbors
11. QUICK CHECK
*Which of the following statements is true for k-NN classifiers?
A) The classification accuracy is better with larger values of k
B) The decision boundary is smoother with smaller values of k
C) The decision boundary is linear
D) k-NN does not require an explicit training step
*k-NN algorithm does more computation on test time rather than train time.
A) TRUE
B) FALSE
*Which of the following statement is true about k-NN algorithm?
1. k-NN performs much better if all of the data have the same scale
2. k-NN works well with a small number of input variables (p), but struggles when the
number of inputs is very large
3. k-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
11