What I Learnt About K-Means Clustering
For my buzzword topic, I decided to explore a common algorithm used in unsupervised machine learning, K-Means clustering.
K-means clustering is able to cluster unlabelled data into clusters by initialising K (the number of clusters we want) number of points at random places throughout a plot containing our data set. The algorithm then iterates the cluster assignment step and move centroid step. The cluster assignment step labels data point on the plot based on its proximity to the centroid points, data points with the same label will be considered a cluster. This is then followed by the move centroid step whereby the centroid point is moved to the average point of the cluster and the process is repeated. It should be noted, that the K-means clustering method may not always result in the optimal result, in order to resolve this, K-means clustering can be conducted multiple times and its performance determined by the average distance any data point is from their respective centroid point. The clustering that has the lowest average distance is considered the best for use.
K-Means clustering has a wide array of applications. Some of its applications include for market segmentation, determining an individual’s social networks and identifying crime localities.