Clustering is used to classify data into groups of related objects. The most frequently used well-organized clustering technique is K-Means clustering. When the initial centroids are computed the efficiency of the K-Means can be highly increased. Initial starting points those generated randomly by the K-Means often make the clustering results reaching the local optima. The better clustering results of K-Means technique can be accomplished after computing more than one times. However, it is difficult to decide the computation limit, which can give the better result. In this paper, a new approach is proposed for computing the initial centroids for K-Means. The proposed method consists of two steps namely Spectral Biclustering and Semi-Unsupervised Gene Selection. Semi-Unsupervised Selection method based on cosine measure is used to compute the initial centroids for the K-Means algorithm. The proposed approach is tested on the microarray gene database. This approach performs better than the previous method. The proposed technique takes similar or slightly more clustering time but the clustering accuracy is very high. This proposed approach is well suited for the gene clustering.
Data mining and knowledge engineering