Day 11 - K-Means Clustering Algorithm
Clustering
It is a technique used to group together objects with similar characteristics. K-Means is an iterative type unsupervised machine learning algorithm.
In their simplest form, clusters are sets of data points that share similar attributes, and clustering algorithms are the methods that group these data points into different clusters based on their similarities. The purpose of clustering and classification algorithms is to make sense of and extract value from large sets of structured and unstructured data.
If you’re working with huge volumes of unstructured data, it only makes sense to try to partition the data into some sort of logical groupings before attempting to analyze it.
K-Means Clustering
One of the popular algorithms used for clustering is the
k-means algorithm. K-means uses distance as a measure of similarity.
Let’s say we have a data set with two columns.
Now, we need to cluster this data set based on similarities observed in
the two columns. And we would like to use the k-means algorithm to create the
clusters.
K-means needs you to specify the number of clusters we want. For example, we can take 2 as the first value of K. Once
we have set this hyperparameter, here’s how the algorithm would then work:
- The algorithm will randomly assign any 2 points as the cluster centers.
- Then, it will compute the distance between each observation and the cluster
centers.
- After this, it will assign each observation to a cluster that is closest to it in
value.
- It will then calculate the mean of the new clusters and keep doing this until there is no change in the clusters.
Comments
Post a Comment