Investigating K-means and Kernel K-means Algorithms with Internal Validity Indices for Cluster Identification

Alissar Nasser *

Faculty of Economic Sciences and Business Administration, Lebanese University, Hadath, Lebanon.

*Author to whom correspondence should be addressed.


Abstract

Clustering is an unsupervised method where the number of clusters is not known by users. Therefore, the outcomes of a clustering algorithm depend on the input number of clusters specified by users. Consequently it is very important to evaluate the result of the clustering algorithms according to the number of clusters and choose the one that optimize a certain criterion. We present in this paper several clustering validity indices used in the literature. Using several synthetic and real datasets, these indices are then compared based on clustering results provided by the well known k-means clustering algorithm and its non-linear version the kernel K-means algorithm. The results showed that none of the validity indices is superior to the others; in the other hand, the kernel k-means failed to improve clustering accuracy of the dataset from the number of clusters perspective.

Keywords: Data mining, clustering algorithms, internal indices, validity indices, k-means, kernel k-means


How to Cite

Nasser, Alissar. 2019. “Investigating K-Means and Kernel K-Means Algorithms With Internal Validity Indices for Cluster Identification”. Journal of Advances in Mathematics and Computer Science 30 (2):1-12. https://doi.org/10.9734/JAMCS/2019/45837.

Downloads

Download data is not yet available.