The K-Means algorithm uses the following procedure to determine the line of discrimination between the data sets:

First, set the number of centroids (default = 4) and the number of iterations (default = 10) from the Edit->Settings menu. For this exercise, we will use the default values.
Initially all data that is entered is pooled together into one large cluster. After creating the initial cluster, N random centroids are generated (N is the number of Centroids entered at startup). Once the cluster and centroids have been generated classification begins. Classification involves iterating over all points in the cluster and creating new clusters based on their proximity to the centroids.
After the new clusters are generated the old centroids are replaced with new ones. The new centroids are generated by simply computing the means of the of the new clusters.
The whole process above is repeated M times (M is the number of iterations) with the intent that eventually the line of discrimination between the data sets will converge.

Click here to go back to the main tutorial page.