The K-Means algorithm uses the following procedure to determine
the line of discrimination between the data sets:
- First, set the number of centroids (default = 4) and the number of
iterations (default = 10) from the Edit->Settings menu. For
this exercise, we will use the default values.
- Initially all data that is entered is pooled together into one large
cluster. After creating the initial cluster, N random centroids are
generated (N is the number of Centroids entered at startup). Once the
cluster and centroids have been generated classification begins.
Classification involves iterating over all points in the cluster and
creating new clusters based on their proximity to the centroids.
- After the new clusters are generated the old centroids are replaced
with new ones. The new centroids are generated by simply computing
the means of the of the new clusters.
- The whole process above is repeated M times (M is the number of
iterations) with the intent that eventually the line of
discrimination between the data sets will converge.
Click here to go back to the main tutorial page.
|