K Means Clustering Mathematics & Elbow Method to find optimal value of K | Data Science | Machine Learning | Explanation |

- June 24, 2020

K Means Clustering Mathematics & Elbow Method to find optimal value of K | Data Science | Machine Learning | Explanation |

K means clustering(KMC) is an unsupervised machine learning algorithm. This directly means the supervision is not here to help the model to learn. In KMC the all process of classification of data is done by the model itself, it recognises the features of the data points and then with more likely features data it put them all in a group called clusters. Basically, it makes the multiple clusters of the related data by identifying the features in the dataset.

Credits: giphy.com

It is an unsupervised machine learning algorithm, which use to classify the data into cluster form. Each cluster contains similar type of data points.

For example, you have some apple, orange and banana then you have to classify them then if you feed it to KMC. KMC will make a group of fruits which looks like same like banana are long and yellow. Orange is spherical and the colour is orange, and lastly, the apples are red and spherical. KMC makes a cluster of them and represents it like A, B and C. If you test it with your test data then it will look for that how much your test data is similar to which group and then it will release output as it belongs to which cluster.

How it works?

1. Choose the best value of K.

K is basically here is the centroid. You have to choose the value of centroid. Straight and forward choose the number of clusters in which your data should be divided. You can use 2,3,4,...

A challenge usually we have to face with the K-means Clustering unsupervised machine learning algorithm i.e. to find the best value of K. You can do it by elbow method, we'll discuss it in this post later.

Explanation: Step 2 & Credits : analyticsindiamag.com

2. Initialize the centroid.

You can initialize the centroid anywhere randomly in the whole space. In this step, the initial cluster assignments are chosen at random. A centroid will be formed wherever the centroid is selected. Currently, this centroid is randomly selected and it will come up with a randomly chosen centroid Further in steps we will make it more precise and made it to the centre of the clusters. You can refer the gif image above through analyticsindiamag.com

Original Datasets<-------->Randomly assigned Clusters<---->Initial Centroid K=3

3. Select the clusters and find the mean

Now we have to calculate the mean observation of the clusters we initially chose. The mean of these clusters will help us to reallocate the centroid to the more centre of the clusters of observations. This will directly help us to improve the cluster's location in the space. This means that as the algorithm is running then clusters obtained will be improved in every step continuously and the algorithm will not stop until we are not getting the same results. When the clusters obtained are no longer changes then we can say a local optimum has been obtained.

Initialize the centroid<----->First realloccation<----->Further reallocation

The clusters formed at the end totally depend on the initial clusters assignment of each observation in the previous step. Therefore the algorithm need to run multiple times from different random initial configurations.

This is the only three steps for the K-means clustering(KMC), after these three steps we can have a differentiated observation based on the initial configurations.

Select the optimal value for the K

Elbow Method:

This method helps us to find the optimum value of K. We can't go with random values because the number of clusters totally make a strong impact on the results obtained. So, in the elbow method, we look for one with the most interpretable solution.

For figuring out the best numbers of clusters(value of K), we can calculate the Within-clusters-sum-of-squares(WCSS). WCSS is the sum of squares of the distances of each data point in all cluster to their respective centroids.

We should take that K value whose WCSS value is lesser. This doesn't mean that we take K=n, at which the WCSS value is 0. Because when WCSS becomes zero then the numbers of clusters will be equal to the numbers of data points we have. Instead, there is always a threshold value of K we have to find it. For this purpose, we use the elbow method.

We can find the optimum value for K using the Elbow method with the help of elbow graph. According to this method, we first initialize the value of K and then for a particular range of values of K we calculate the WCSS value. Following this process, we also plot the graph of WCSS vs K(number of clusters). For example below image:

Elbow Graph & Credits: oreilly.com

The optimal value of K totally depends on the rate of decrease of WCSS value. At which point the rate of decrease of WCSS value suddenly changes to the minimal rate of change in WCSS that point called Elbow point. Basically in the graph, the rate of decrease of WCSS value was high but suddenly at elbow point(K=3) the rate of decrease in the WCSS value get minimal. So, the optimal value of K is the value of K at the elbow point.

Thank you so much for reading. Please if you love my work than you can connect with me on LinkedIn and Github

Search This Blog

| INNOVITRONICS |