客户流失预测——相关论文学习笔记

最新推荐文章于 2023-11-11 17:59:29 发布

RaymondLove~

最新推荐文章于 2023-11-11 17:59:29 发布

阅读量1.3k

点赞数 1

分类专栏：机器学习文章标签：客户流失预测客户流失预测相关论文客户流失预测论文学习笔记

本文链接：https://blog.csdn.net/Emma_Love/article/details/107093390

版权

机器学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Table of Contents

1. 《Churn prediction in telecommunication using ML》

2. 《Handling imbalanced data in churn prediction using ADASYN and Back-propagation algorithm》

3. 《Customer churn prediction for retail business》=> Useless

4. 《A comparative study of customer churn prediction in telecom industry using ensemble based classfiers 》- Useless

5. 《Customer churn prediction in an internet service provider》

6. 《A review and analysis of churn prediction methods for customer retention in telecom industries》

7. 《Using deep learning to predict customer churn in a mobile telecommunication network》

8. 《Churn analysis and plan recommendation for telecom operators》

9. 《A data mining process framework for churn management in mobile telecommunication industry》

10. 《Using deep learning to predict customer churn in a mobile telecommunication network》

1. 《Churn prediction in telecommunication using ML》
Abstract
- Setbacks (difficulties):
  - enormous database；
  - large feature space；
  - imbalanced class distribution：number of churner << number of non-churners
- Solutions:
  - Data imbalance: SMOTE (synthetic minority over-sampling technique)
    - SMOTE实现：http://shataowei.com/2017/12/01/python%E5%BC%80%E5%8F%91%EF%BC%9A%E7%89%B9%E5%BE%81%E5%B7%A5%E7%A8%8B%E4%BB%A3%E7%A0%81%E6%A8%A1%E7%89%88-%E4%B8%80/
  - Feature reduction methods, including: co-relation feature extraction, gain ratio, information gain and One-R feature evaluation methods
  - Models: CART (classification and regression trees) , bagged CART and PART (partial decision trees
  - Evaluations: AUC, sensitivity and specificity
Introduction
- Common Feature selection methods：PCA, Gain Ratio, Information gain, OneR and Co-relation based techniques
- Common Sampling methods:
  - Random Oversampling (ROS): random instances from minority class are simply replicated --> prone to overfitting
  - Random undersampling (RUS): random instances from the majority class are discarded --> may discard some useful instances
- Models:
  - Tree and rules-based: CART, PART,
  - Ensemble of trees: C 5.0, bagged CART, RF, XGBoost
  - Linear: LR, linear discriminant analysis
  - Non-linear: Neural network, SVM, KNN, Naïve Bayes
Related techs
- Models: KNN, RF, Rotation Forest, Adaboost,
- Model fusion: ordered weighted average, vote
- Feature extraction: PCA, F-score, Fisher's ratio, Minimum Redundancy Maximum Relevance
Methods and materials
- Data set: 1:3, imbalanced
- Data preprocessing: removing useless features + sampling using SMOTE + feature selection using Co-relation, Gain Ratio, information gain and oneR
- Intro SMOTE: https://www.cnblogs.com/Determined22/p/5772538.html Create new similar instances instead of same instances for minority class data, so that it can soften decision boundary, and further the classification can be more general and does not over-fit.
- Co-relation Feature Selection: Pearson's co-relation coefficient and spearman's co-relation coefficient
  - Information gain attribute selection: entropy
  - Gain Ratio attribute selection: overcome the limitation of IG (is used to select attributes for the terminal nodes of the decision tree)
  - OneR Attribute Selection: short for One Rule. Generating one rule for each predictor in the data, then selects the rule with the smallest total error as its "one rule"
  - Decision tree based classification
  - Partial tree based classification PART: decision trees that are prune the decision tree on their own
  - Bagged tree classification: bootstrap aggregation or bagging
  - Boosted classification trees:
Conclusions
- Adequate preprocessing and data balancing in case of imbalanced datasets are bound to improve the classification performances of the used classifiers.
- SMOTE based classifier ---> improve classification performance
- Ensemble approach can achieve performance
- Co-relation based feature extracted better than other selection methods in this case
2. 《Handling imbalanced data in churn prediction using ADASYN and Back-propagation algorithm》
Abstract
- Churn prediction difficulty: imbalanced data
- Methods in the paper: Oversampling algorithm: ADASYN (Adaptive synthetic sampling), an oversampling methods ==> solve imbalanced problem. ADASYN is an improvement algorithm from SMOTE (Synthetic minority over-sampling)
- Classification method: backpropagation algorithm
Introduction
- DATA SIZE: tens of columns (attributes) & thousands of rows of data
- Model: boosting, random forest and its modification
Methodology
- Input data: 1 year Tel data, 55 features & 200.387 rows; in-balanced data: churn: un-churned= 0.04, 0.96
- Feature selection: Pearson correlation equation
- Resampling using ADASYN: difference between ADASYN - SMOTE: ADASYN uses density beta distribution as a reference for determining the number of synthetic data
- Constructing churn prediction with backpropagation method
  - Forward propagation of operating signal:
  - Back propagation of error signal:
- Performance measurement
  - F1-Score & accuracy
    - Precision = TP/(TP+FP): 体现了模型对负样本的识别能力，precision越高，说明模型对负样本的区分能力越强
    - Recall (sensitivity) = TP/(TP+FN): 体现了分类模型对正样本的识别能力，recall越高，说明模型识别正样本的能力越强
    - F1- score= 2TP/(2TP+FN+FP) = 2*precision*recall/(precision + recall): 是两者的结合，F1-score越高，说明分类模型越稳健
  - Confusion Matrix
Results & analysis
- Data source: PT Telkom Indonesia recorded from 2014.10 - 2015.9; 200387 rows, with 55 features, only % of total data are churned. After feature extraction, 38 features are used
3. 《Customer churn prediction for retail business》=> Useless
Abstract
- Dataset: UCI machine learning repository; 2010.1 - 2011.1 transactions records
- Method: preprocessing to remove NAs, validating numerical values, removing erroneous data points; perform aggregations on the data to generate invoice + customer data sets; ML algorithms: SVM, RF, Extreme gradient boosting
Introduction
- Customer churn = customer attrition = customer defection
- Objective of this project:
  - Predict churn value for all the customers of the company for a given period of time
  - Compute the overall churn rate for the given time
  - Provide deeper insight into the sales by analyzing customers' buying pattern
  - Detect customers who are about to drop out from the business in order to take necessary steps
  - Provide clear visualizations of the churn predictions to help business come up with better strategies
  - Help business know the real value of a potential churn customer and retain him/her as a loyal customer by establishing priorities, optimizing resources, putting efficient business efforts and maximizing the value of the portfolio of the customer
  - Help business come up with personalized customer retention plans to reduce the churn rate
- Deliverables
  - A system that can predict if a customer is a churn or not for a retail business
  - A system which can compute churn rate of the retail business
  - A system which can run multiple algorithms and compare performance among them. Algorithms include RF, SVM, Gradient boosting to predict the customer churn for a given period of time
Literature survey
- Sequential patterns
  - DEML:
  - Genetic modelling:
  - Neural Networks:
  - Logistic regression and random forests
  - Game theory: 博弈论
Design
- Architecture design:
- Sequence diagram
- Data flow diagram
Implementation
- Data size: 541909 rows
- Preprocess: cleaning data + aggregation
- Training:
- ML model: RF; SVM;Gradient boosting
4. 《A comparative study of customer churn prediction in telecom industry using ensemble based classfiers 》- Useless
Abstract
- Comparing ensemble based classifiers were compared with well-known classifiers namely decision tree, naïve Bayes classifier, and SVM
Introduction
Literature survey
Working methodology
- Decision tree: C4.5 ----> accuracy is high, but fails to respond to noise
- Naïve bayes --->
- SVM ---> not suited for data with noise
- Bagging (bootstrap aggregation):
  - Divide dataset into k subset with replacement
  - Train the model by using (k-1) subset and test the model by using the rest 1 subset
- Boosting
  - Maintain a weight for each training tuple
- Random forest:
  - Disadvantage: cannot handle unbalanced dataset by using random forest
5. 《Customer churn prediction in an internet service provider》
Abstract
- Methodology:
- Feature engineering:
  - SMOTE oversampling method -- reduce the imbalance between the number of churners and non-churneres
  - Machine learning model: adaboost, extra trees, knn, neural network, xgboost
- Experimental Results
  - Xgboost is the best, precision = 45.71%; recall = 42.06%
- Dataset characters:
  - Churn : un-churn=2 ： 98
Introduction
- Problem statement: predict whether customer will renew their service (monthly subscriptions). That means, service will be expired at the end of each month, the valid duration to renew the services is from the expired time to next 16 days.
  - Churn customers: don't renew their subscription in next 16 days, before the end of current service & terminate current service
- Noticed: the status of a customer, churn or non-churn, is determined at the end of each month, regardless of the previous status
- After this period, users who do not renew the services are identified as churn customers -----> we can define, predict whether our customer will renew their subscription in next half of renew term.
Related work
- Noticed: Metrics in churn prediction:
  - For finding the most possible churning customers === precision measures would be more effective
  - For purpose of retaining most customers ===== recall of the model needs to be improved
- Reducing imbalanced data:
  - SMOTE
- Model:
  - KNN
  - Adaboost: sensitive to nosie data and outliers
  - Extra - trees:
  - Neural Network:
Data and feature engineering
- Features were seperated 3 main groups ---> .1 customer information; 2. their usage data; 3. service data
  - Customer info: registration date, termination date, location, service type, cable type, bandwidth, payment history, promotion, and so on
  - Customer usage data: the initial date time of connection, disconnection date time, reason for rejection, type of modem, user's daily usage such as amount of data downloaded & uploaded
  - Customer service data: customer's inbound and outbound call phone history; customer satisfaction surveys
6. 《A review and analysis of churn prediction methods for customer retention in telecom industries》
Abstract
- Focusing on analyzing the churn prediction techniques to identify the churn behavior and validate the reasons for customer churn
  - Summarize the churn prediction techniques --> deeper understand of the customer churn
  - Shows the most accurate churn prediction --> hybrid models rather than single algorithms
Analysis of customer churn prediction methodologies
- Preprocessing - imbalanced problem and sampling base on churn prediction
- Ensemble methods:
  - Reference: http://scikit-learn.org/stable/modules/ensemble.html
  - Goal: combine the predictions of several base estimators but with a given learning algorithm in order to improve generalizability / robustness over a single estimator.
  - Two families of ensemble methods:
    - Averaging methods: the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduce
      - Examples: bagging methods, forecast of randomized trees
        Bagging meta-estimator
        Bagging method form a class of algorithms which build several instances of a black-box estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction.
        Forecast of randomized trees
        RF: each tree in the ensemble
        Extra- tress
    - Boosting methods: base estimators are built sequentially and one tries to reduce the bias of the combined estimator,. The motivation is to combine several weak models to produce a powerful ensemble
      - Examples: Adaboost, gradient tree boosting
- Churn prediction from big tree
7. 《Using deep learning to predict customer churn in a mobile telecommunication network》
Abstract
- Auto-encoders -----> deep belief networks ----> multi-layer feedforward networks
- Framework: four - layer feedforward architecture
Introduction
- Motivation: use deep learning to avoid time-consuming feature engineering effort and ideally to increase the predictive performance of previous models.
- Dataset intro:
  - Historical data from a telecommunication company with nearly 1.2 million customers and span over sixteen months
  - Challenging Characteristics:
    - Churn rate is very high and all customers are prepaid users
Churn prediction in prepaid mobile telecommunication network
- Goal: to infer when this lack of activity may happen in the future for each active customer
- State definition:
Deep learning models for churn prediction
- Input data & preparation:
8. 《Churn analysis and plan recommendation for telecom operators》
Abstract
- In this paper, we design a hybrid ML classifier to predict if a customer will churn based on the CDR parameters an we also propose a rule engine to suggest best plans
9. 《A data mining process framework for churn management in mobile telecommunication industry》
Introduction
- Aims：By using a combination of expert systems and machine learning techniques, the process framework handles churn prediction from 3 perspectives:
  - Prediction of which subscriber may churn
  - Determination of reasons why subscriber may churn
  - Recommendations of appropriate strategy for customer retention
- Data
  - A rich chunk of telecom subscribers' demographic data
  - Subscribers' transactions information
  - Subscribers' complaints information
Process framework
Experiment
- 1. Collect Raw Dataset
  - Subscriber Data: caller number, called number, incoming route, outgoing route, amount b4 call, amount after call, inter national mobile subscriber identity, exchange id, record type, event type, date of subscription, type of service subscribed and subscribe number
  - Complaint Data: request_complained_id, date of complaint, time of complaint, type f complaint, status (open and close), imputer(internal staff initiator), handle_by_person
- 2. Data prediction model
  - Raw Data ---> features
    - 20 featurers
  - Churn prediction with artificial neural network
  - Generating churn reasons and intervention strategy
    - Results obtained from churn prediction using ANN ---> decision support expert system (DSES) to generate probable reasons for churn & recommendations for customer retention.
      - DSES: - a set of if-then rules that enabled the generation of recommendations of appropriate incentives based on the credit rating of a subscriber.
        Classify subscribers ---> high-valued, medium-valued and low-valued , so we can ignore low valued subscribers, and put more efforts on high-valued & medium valued.
        Generate churn reasons
        Based on the rules to determine the churn reasons.
        The following is a sample of Jess Rules in the DSEM

10. 《Using deep learning to predict customer churn in a mobile telecommunication network》

Understanding and calculating churn
- High level
  - Measure of how many customers leave over a set time period
  - Measure how much revenue you loose through customer cancellations
- How churn can impact the bottom line
  - Calculate Lifetime value (LTV) - to understand the value a CSM has
    - Basic LTV
    - Cost of customer acquisition (COCA)
    - Cost of Goods sold (COGS)
  - Calculate churn
    - Customer churn
    - Revenue churn
- Analyzing churn
  - Reasons -
    - Find churn reasons to focus and prioritize
    - Know whether actions to retain customer is working
  - Methods
    - Cohort reports 列式报表
      - Type 1
    - Churn by customer age - grouping your customer by age
    - Churn by customer behavior
      - Need to look at customers who use a certain feature or complete a certain action and determine it's impact on churn