Deep insight of K-Nearest Neighbour(KNN) | The Lazy algorithm | Mathematics | Machine Learning

Deep insight of K-Nearest Neighbour(KNN) | The Lazy algorithm | Mathematics | Machine Learning

Photo by Nina Strehl on Unsplash


It is an instance-based supervised machine learning algorithm. In this algorithm, we find what is the neighbour instance of this value. It is used for both regression and classification. 

KNN Classification:

Let's have a look at the following diagram, and focus on the Heart-shaped Marker(💜), try to examine visually that in which category it will lie in red class or blue class.

The Graph

If we are assuming 5 nearest neighbours around the Heart, then visually you can determine that is more likely to belong to the blue class. Because there are almost three points are the nearest neighbour of the heart from the blue class and only two points from the red class. See the Graph.

The Graph


So, this is all about the K-nearest Neighbour classification.
K-nearest neighbour is a simple supervised machine learning algorithm, that stores the all available cases and classifies new cases and classifies new cases on a similarity measure(e.g., distance function).

Step by Step Algo:

1. Choose the "K" values, that is how many nearest neighbours we have to consider. This step should be done carefully because the number of nearest value i.e. you are considering is the quality factor for your model. It directly affects your model's training.

2. We calculate the distance of the neighbours around the given point and then select the nearest neighbours only. With the help of the distance formula:

Distance Formula


3. How many neighbours are related to which category? The majority will be selected as a result. And by only just three steps we can classify that the given point belongs to which category.

Finding K value: 

  • For finding the K value we can use hit and trial method, just put random values again and again and then check your model accuracy. But it will not work for large datasets.
  • Always go for a larger value of K, because it will decrease the noise in results, and in the end it will give you better accuracy.
  • Last but not least, you can do cross-validation. Just take a small part of your dataset and use it to find the best value of K and then apply it to the whole dataset.

If you love my work then you can connect me on LinkedIn and Github.

Comments

Popular posts from this blog

SMART HOSPITAL | IoT based smart health monitoring system | Real time responses | Internet Of Things | ESP8266

A Quick Guide to Data pre-processing for Machine Learning | Python | IMPUTATION | STANDARDISATION | Data Analysis | Data Science

Plotly & Cufflinks | A Data Visualisation Library with Modern Features | Python | Data Science | Data Visualisation