Yuting Ma, Yuejing Ding, Tian Zheng
Feb 1, 2018
Statistical Analysis and Data Mining: The ASA Data Science Journal
High‐dimensional data have provided vast amounts of information for scientific research and learning. However, in most cases, such information is buried in noise from noninformative features. Learning informative feature subspaces has become a necessary step for supervised learning tasks in high dimensions in order to improve accuracy and interpretability. The learning methods should also consider possible interactions among the features that may carry significant signals and reveal important scientific findings. In this paper, we develop a nonparametric measure of association between class label and continuous‐valued feature subspaces using local point processes (LPP) patterns. A backward elimination algorithm based on random subspaces is used to identify informative feature subspaces according to this measure. Through simulations and real data applications, the proposed method demonstrates effectiveness in identifying patterns that are informative about the class difference not only marginally but also with higher‐order interactions among features. As a result, the proposed method outperforms other popular feature selection methods with better generalizability and robustness.