Data clustering is one of the popular tasks recently used in the educational data mining arena for grouping similar students by several aspects such as study performance, behavior, skill, etc. Many well-known clustering algorithms such as k-means, expectation-maximization, spectral clustering, etc. were employed in the related works for educational data clustering. None of them has taken into consideration the incompleteness of the educational data gathered in an academic credit system. If just a few records have missing values, we might ignore them in the mining task. However, as there are a large number of missing values, ignoring them may lead to the data insufficiency and ineffectiveness of the mining task. On the other hand, many research works have recently proposed several general-purpose solutions to incomplete data clustering. The main difficulties with these solutions for reuse are how to appropriately determine a pre-specified number of the clusters for each data set in a particular application domain and the non-arbitrary shapes of the resulting clusters. Hence, we focus on two parts: the first one that resolves the incomplete data clustering task in the education domain and the second one that proposes a robust effective approach to the aforementioned clustering task. Our resulting solution to clustering incomplete educational data is a mean shift-based clustering algorithm named iMS_nps using the nearest prototype strategy. iMS_nps can also overcome the aforementioned difficulties. In addition, experimental results have shown that the clusters from our proposed algorithm have better cluster quality as compared to some existing approaches.
Vo Thi Ngoc Chau, Phan Huu Loc, Vo Thi Ngoc Tran
2015 International Conference on Advanced Computing and Applications (ACOMP)