Hybrid approach using machine learning algorithms for customers' churn prediction in the telecommunications industry
Published Sep 19, 2021 · Y. Beeharry, Ristin Tsokizep Fokone
Concurrency and Computation: Practice and Experience
15
Citations
1
Influential Citations
Abstract
Ensemble learning involves using several individual classifiers and combining their predictions, which may result in better performance than a single classifier. This article proposes a two‐layer flexible voting ensemble model to predict the customer churn rates in the telecommunication industries. The datasets used in this study are from IBM Sample Data Sets and Duke University. Following the pre‐processing stage, the datasets have been categorized into an imbalanced and a balanced set. The balanced set comprises of an equal number of instances for both classes (‘churn’ and ‘not churn’). Extensive investigations were also carried out to determine the circumstances under which the model provides the best performance. The results of the hybrid algorithm with the IBM imbalanced dataset give an accuracy of 82.30% and an F1‐score of 63%. However, with the IBM balanced dataset, an accuracy of 76.20% and an F1‐score of 77.06% are obtained. When considering the dataset from Duke University an accuracy of 71.33% and an F1‐score of 14.3% are obtained with the imbalanced dataset. The proposed model provides an accuracy of 60.41% and an F1‐score of 64.13% with the corresponding balanced dataset. Test work results indicate that the approach adopted has significantly increased the F1‐score of the classification when considering a balanced dataset in both cases. Additionally, p‐values of less than 0.05 indicate that the results obtained with IBM imbalanced dataset and both balanced and imbalanced dataset from Duke University are statistically significant.