Prediction of customer churn using resampling and ensemble classification algorithms on telecom dataset
Published Apr 28, 2023 · A. Raj, D. Vetrithangam
2023 11th International Conference on Emerging Trends in Engineering & Technology - Signal and Information Processing (ICETET - SIP)
3
Citations
0
Influential Citations
Abstract
This study’s goal is to offer a descriptive estimation of how a machine learning algorithms perform to a successful consumer churn predictions(CCP) approach. Customer retention as well as acquisition is a major challenge for many sectors, and it is particularly severe in organisations that are experiencing rapid expansion and strong competition. Data-mining analyses have been carried out to anticipate consumer turnover from the large data-sets of clients with specified features in a telecommunications firm in order to hold their clients who likely to leave. The most widely utilised algorithms among those employed in earlier research are decision trees, logistic regression, K-Nearest Neighbor, Naive Bais, and support vector machines. Only a small number of research in recent years have demonstrated the effectiveness of sophisticated ensemble learning models, including XGBoost and CatBoost, using hybrid resampling data sets for predicting telecom customer turnover and achieving high prediction performance in classification challenges. The goal of this study is to contribute in a special way to the field of predicting client attrition. The experiments are carried out using real-world datasets acquired from the Kaggle IBM dataset and five data mining algorithms, including Decision Tree, Support Vector Machine, and ensemble learning technique schemes including Random Forest, XGBoost, and Catboost are validated using three datasets in which one is imbalanced and other two are resampled (SMOTE and SMOTE-ENN) techniques, and the model’s prediction performance is compared to conventional methodologies. The Recall, Precision, F1, Accuracy and ROC-AUC scores for all three data-sets were compared, and it was discovered that XGBoost and Catboost outperform Decision Tree (DT) and Support Vector Machine(SVM), and they produce results that are extremely comparable to those of the Random Forest classifier scheme.