INTERACTIVE TOOL FOR K-MEANS CLUSTERING
Levon R. Hayrapetyan
Houston Christian University
ABSTRACT
k-Means clustering is one of the most used clustering methods in data mining. It uses
“centroids”- k different randomly generated points in the data set and assigns every data point to
the nearest centroid. After every point has been assigned, the centroid is moved to the average of
all the points assigned to it. Then the process repeats, i.e., every point is reassigned to its nearest
new centroid and the centroids are moved to the average of points assigned to it. The clustering
process is done when no point changes to an assigned centroid.
In this study, an interactive method is developed which allows us to create random data
sets with two variables, visualize the assignment of data points to current centroids, as well as
visualize the positions of newly calculated centroids. This tool significantly improves students’
understanding of the internal logic of k-means clustering. It will be used in data analytics-related
courses.
Keywords: k-Means clustering, data mining, interactive tools, cluster centroid