Investigating K-means and Kernel K-means Algorithms with Internal Validity Indices for Cluster Identification
Alissar Nasser *
Faculty of Economic Sciences and Business Administration, Lebanese University, Hadath, Lebanon.
*Author to whom correspondence should be addressed.
Abstract
Clustering is an unsupervised method where the number of clusters is not known by users. Therefore, the outcomes of a clustering algorithm depend on the input number of clusters specified by users. Consequently it is very important to evaluate the result of the clustering algorithms according to the number of clusters and choose the one that optimize a certain criterion. We present in this paper several clustering validity indices used in the literature. Using several synthetic and real datasets, these indices are then compared based on clustering results provided by the well known k-means clustering algorithm and its non-linear version the kernel K-means algorithm. The results showed that none of the validity indices is superior to the others; in the other hand, the kernel k-means failed to improve clustering accuracy of the dataset from the number of clusters perspective.
Keywords: Data mining, clustering algorithms, internal indices, validity indices, k-means, kernel k-means