Day 11 - K-Means Clustering Algorithm

         By Jerin Lalichan 


Clustering


    It is a technique used to group together objects with similar characteristics. K-Means is an iterative type unsupervised machine learning algorithm.
In their simplest form, clusters are sets of data points that share similar attributes, and clustering algorithms are the methods that group these data points into different clusters based on their similarities.

    The purpose of clustering and classification algorithms is to make sense of and extract value from large sets of structured and unstructured data. 
    
If you’re working with huge volumes of unstructured data, it only makes sense to try to partition the data into some sort of logical groupings before attempting to analyze it.

K-Means Clustering 






    One of the popular algorithms used for clustering is the k-means algorithm. K-means uses distance as a measure of similarity. Let’s say we have a data set with two columns. Now, we need to cluster this data set based on similarities observed in the two columns. And we would like to use the k-means algorithm to create the clusters. 
    
K-means needs you to specify the number of clusters we want. For example,  we can take 2 as the first value of K. Once we have set this hyperparameter, here’s how the algorithm would then work:

  • The algorithm will randomly assign any 2 points as the cluster centers.

  • Then, it will compute the distance between each observation and the cluster centers. 

  • After this, it will assign each observation to a cluster that is closest to it in value. 

  • It will then calculate the mean of the new clusters and keep doing this until there is no change in the clusters.

Every data point is allocated to each of the clusters by reducing the within-cluster sum of square (WCSS). 



   
  I am doing a challenge - #66DaysofData  in which I will be learning something new from the Data Science field for 66 days, and I will be posting daily topics on my LinkedIn, On my GitHub repository, and on my blog as well.


Stay Curious!  





Comments

Popular posts from this blog

Day 17 - Ensemble Techniques in ML - Averaging, Weighted average

Day 4 - Performance metrics in Machine Learning - Regression