Day 12 - How to find the best K value in K-Means Algorithm - Elbow Curve

          By Jerin Lalichan 


How to do it?

Clustering algorithms like K-Means need the user to input the number of clusters to be formed. For this, we need to find the optimum number of clusters to be generated. A commonly used method is using the Elbow Curve.


Elbow Curve / Knee Curve

    K means works in a way to reduce the Within-cluster sum of squares (WCSS) is minimized. In this method, we vary the value of K from 1 to 10. 

    For each value of K, the WCSS is calculated. WCSS is nothing but the sum of squares of the distance between each value and their corresponding cluster centroid. 




    We start with K=1, and the highest value for WCSS  is observed for K=1.
When the K goes higher, WCSS decreases. And from the above graph, we can see that WCSS shows a rapid change at a certain point (here K=5), and the line gets parallel to the X axis. And this point is called the Elbow point and is taken as the optimum value of K. 
    







   
  I am doing a challenge - #66DaysofData  in which I will be learning something new from the Data Science field for 66 days, and I will be posting daily topics on my LinkedIn, On my GitHub repository, and on my blog as well.


Stay Curious!  





Comments

Popular posts from this blog

Day 17 - Ensemble Techniques in ML - Averaging, Weighted average

Day 4 - Performance metrics in Machine Learning - Regression