2014 至 2021 电商用户流失预测之初步调研分析

电商用户流失预测之初步调研分析

一、公开数据集:

1. Telco customer churn (11.1.3+)

数据背景
  • 名称:

    Telco customer churn电信客户流失数据

  • 来自:
    IBM商业分析社区 Samples Team 团队提供 (IBM Business Analytics Community )

  • 时间:
    2019 08:15 AM 公开发布

  • 网址:
    https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

  • 备注:
    该数据只有训练集,没有测试集,因此只需要利用训练集得出预测的准确率

数据内容

Inventory of Telco Assets

A variety of objects have been updated/created that work together to tell a comprehensive story:

  • Telco churn: This sample dashboard tracks a fictional telco company’s customer churn based on a variety of factors. The Churn Label column indicates whether or not the customer left within the last month. Other columns include location, monthly charges, services, and customer lifetime value. Location: Team content > Samples > Dashboards.

  • Quarterly churn update: This sample story shows quarterly changes of customer churn in a fictional telco company, and which contract and location has the highest churn in order to decide the goals for the next quarter. The churn label column indicates whether or not the customer left within the last quarter. Location: Team content > Samples > Stories.

  • Customer churn information by zip code: This sample report is the drill-through target report for sample dashboard ‘Telco churn’ and sample story ‘Quarterly churn update’. Location: Team content > Samples > Reports.

  • Telco churn relationships: This sample exploration tracks a fictional telco company’s customer churn based on a variety of factors. The Churn Label column indicates whether or not the customer left within the last month. Other columns include location, monthly charges, services, and customer lifetime value. Location: Team content > Samples > Explorations.

  • Telco customer churn: This sample data module tracks a fictional telco company’s customer churn based on a variety of possible factors. The churn column indicates whether or not the customer left within the last month. Other columns include gender, dependents,
    monthly charges, and many with information about the types of services each customer has. Source: IBM. Location: Team content > Samples > Data. The Telco customer churn data module is composed of 5 uploaded files:

    a. Telco_customer_churn_demographics.xlsx
    b. Telco_customer_churn_location.xlsx
    c. Telco_customer_churn_population.xlsx
    d. Telco_customer_churn_services.xlsx
    e. Telco_customer_churn_status.xlsx

2. WA_Fn-UseC_-Telco-Customer-Churn.csv

数据背景
  • 来自:

    IBM商业分析社区 Samples Team 团队提供
    (IBM Business Analytics Community )

  • 时间:
    2018 2月24 02:20 AM 公开发布

  • 网址:
    https://www.kaggle.com/blastchar/telco-customer-churn

  • 大小:
    954.59 KB(7044rows*21cols);

  • 备注:
    每行代表一个客户,每列包含元数据列中描述的客户属性。字

  • 段数:
    20 行数:7044 有缺失值

数据内容

Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

  • Customers who left within the last month – the column is called Churn
  • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
  • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
  • Demographic info about customers – gender, age range, and if they
    have partners and dependents

3. 电信客户流失数据集

数据背景
  • 来自:
    DataFountain 平台 个人用户提供

  • 时间:
    2020.12.31 15:01

  • 网址:
    https://www.datafountain.cn/datasets/5667

  • 大小:
    172 kb

  • 备注:
    重点客户挽留留存计划, 预测保留客户的行为,可以分析所有
    相关的客户数据,并开发针对性的客户保留程序

数据介绍

每行代表一个客户,每列包含在元数据列中描述的客户属性。

数据集包含有关以下信息:

  • 在上个月内离开的客户–该列称为“客户流失”
  • 每个客户已签署的服务-电话,多条线路,互联网,在线安全,在线备份,设备保护,技术支持以及流电视和电影
  • 客户帐户信息–他们成为客户的时间,合同,付款方式,无纸化账单,每月费用和总费用
  • 有关客户的人口统计信息-性别,年龄段以及他们是否有伴侣和受抚养人

4. 电信客户流失数据

数据背景
  • 来自:
    DataFountain 平台 个人用户提供

  • 时间:
    2018.12.26 20:02

  • 网址:
    https://www.datafountain.cn/datasets/35

  • 备注:
    专注的客户保留计划

数据介绍

序号 字段名 数据类型 字段描述

  • 1 customerID Integer 用户ID
  • 2 gender String 性别
  • 3 SeniorCitizen Integer 老年人
  • 4 Partner String 配偶
  • 5 Dependents String 家属
  • 6 tenure Integer 职位
  • 7 PhoneService String 电话服务
  • 8 MultipleLines String 多线
  • 9 InternetService String 互联网服务
  • 10 OnlineSecurity String 在线安全
  • 11 OnlineBackup String 在线备份
  • 12 DeviceProtection String 设备保护
  • 13 TechSupport String 技术支持
  • 14 StreamingTV String
  • 15 Contract String 合同
  • 16 PaperlessBilling String 账单
  • 17 PaymentMethod String 付款方式
  • 18 MonthlyCharges Integer 月费用
  • 19 TotalCharges Integer 总费用
  • 20 Churn String 流失

二、科学研究文献

1_A Customer Churn Prediction Model in Telecom Industry Using Boosting

  • SCI 1区 2014年 期刊:《IEEE Transactions on Industrial Informatics》

  • 链接:
    https://pan.baidu.com/s/1YH0-ZY_pkNtYB0WqzR3b7Q
    提取码:l8yk

  • 摘要

Abstract—With the rapid growth of digital systems and associated information technologies, there is an emerging trend in the global economy to build digital customer relationship management (CRM) systems. This trend is more obvious in the telecommunications industry, where companies become increasingly digitalized. Customer churn prediction is a main feature of in modern telecom communication CRM systems. This research conducts a real-world study on customer churn prediction and proposes the use of boosting to enhance a customer churn prediction model. Unlike most research that uses boosting as a method to boost the accuracy of a given basis learner, this paper tries to separate customers into two clusters based on the weight assigned by the boosting algorithm. As a result, a higher risk customer cluster has been identified. Logistic regression is used in this research as a basis learner, and a churn prediction model is built on each cluster, respectively. The result is compared with a single logistic regression model. Experimental evaluation reveals that boosting also provides a good separation of churn data; thus, boosting is suggested for churn prediction analysis.

2_A Big Data Clustering Algorithm for Mitigating the Risk of Customer Churn 2016

  • SCI 1区 2016年 期刊:《IEEE Transactions on Industrial Informatics》

  • 链接:https://pan.baidu.com/s/1eYkF2h3JbXH5c0XRWd8YRQ
    提取码:q6gj

  • 摘要

Abstract—As market competition intensifies, customer churn management is increasingly becoming an important means of competitive advantage for companies. However, when dealing with big data in the industry, existing churn prediction models cannot work very well. In addition, decision makers are always faced with imprecise operations management. In response to these difficulties, a new clustering algorithm called semantic-driven subtractive clustering method (SDSCM) is proposed. Experimental results indicate that SDSCM has stronger clustering semantic strength than subtractive clustering method (SCM) and fuzzy c-means (FCM). Then, a parallel SDSCM algorithm is implemented through a Hadoop MapReduce framework. In the case study, the proposed parallel SDSCM algorithm enjoys a fast running speed when compared with the other methods. Furthermore, we provide some marketing strategies in accordance with the clustering results and a simplified marketing activity is simulated to ensure profit maximization.

3_Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront

  • SCI 4 区 2018年 期刊:《Journal of Intelligence and Information Systems》

  • 链接:https://pan.baidu.com/s/1sk_YpMMX5xHywAuxsUtk6g
    提取码:v5ly

  • 摘要

Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing.
The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, re relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies.
In this study, we propose ‘CNN model of Heterogeneous Information Integration’ using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses ‘heterogeneous information integration’, ‘unstructured information vector conversion’, ‘multi-layer perceptron design’, and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper.
In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model.
Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance

4_ Study on the Prediction of Imbalanced Bank Customer Churn Based on Generative Adversarial Network

  • 2020年 期刊: 《Journal of Physics Conference》 期刊引用率很高,但是找不到SCI 分区

  • 链接:https://pan.baidu.com/s/1lV_Xpeb2znNIrzybqrGQUQ
    提取码:iql8

  • 摘要

Abstract. The imbalanced commercial bank customer data will lead to the unpredictability of the minority class. Therefore, this paper proposes an imbalanced data method based on generative adversarial network to deal the problem of poor prediction performance of traditional classifiers for minority class. This paper method is
based on the generative adversarial network to generate minority class samples to improve imbalanced data. Finally, the classifier is used to train the balanced data to improve the prediction performance of minority class. In this experiment, the data of a commercial bank customer were measured with indicators such as F1, Precision, and compared with traditional data sampling methods such as SMOTE, BSSMOTE. This method is feasible and applicable to the classification of imbalanced data of banks by observing the experimental results, which has better application value.

5_Incorporating textual information in customer churn prediction models based on a convolutional neural network

  • SCI 3区 2020年 期刊: 《INTERNATIONAL JOURNAL OF FORECASTING》

  • 链接:https://pan.baidu.com/s/1asn4-wqHRKu_V8qE_WJD7Q
    提取码:o0pe

  • 摘要

This study investigates the value added by incorporating textual data into customer churn prediction (CCP) models. It extends the previous literature by benchmarking convolutional neural networks (CNNs) against current best practices for analyzing textual data in CCP, and, using real life data from a European financial services provider, validates a framework that explains how textual data can be incorporated in a predictive model. First, the results confirm previous research showing that the inclusion of textual data in a CCP model improves its predictive performance. Second, CNNs outperform current best practices for text mining in CCP. Third, textual data are an important source of data for CCP, but unstructured textual data alone cannot create churn prediction models that are competitive with models that use traditional structured data. A calculation of the additional profit obtained from a customer retention campaign through the inclusion of textual information can be used by practitioners directly to help them make more informed decisions on whether to invest in text mining.

6_Evaluation of customer behavior with temporal centrality metrics for churn prediction of prepaid contracts

  • SCI 1区 2020年 期刊:《EXPERT SYSTEMS WITH APPLICATIONS》

  • 链接:https://pan.baidu.com/s/1Wwzi9NQiSURGslMaGXBubA
    提取码:mi1u

  • 摘要

The telecommunication industry is a saturated market where a proper implementation of a retention campaign is critical to be competitive, since retaining a customer is cheaper than attracting a new one. Hence, it is crucial to detect customer behavioral patterns and define accurate approaches to predict potential churners. Multiple researchers have used binary classification methods to predict churn of customers. Some of them verify that customers’ social relationships influence the decision of changing the operator. We propose a novel method to extract the dynamic relevance of each customer using social network analysis techniques with a binary classification method called similarity forests. The dynamic importance of each customer is determined by applying various centrality metrics over temporal graphs, to represent the relationships between customers and to extract behavioral patterns of churners and non-churners. These relationships are established in a temporal graph using the call detail records (CDR) of telco’s customers. In this paper, we compare the performance of different centrality metrics applied over two types of temporal graphs: Time-Order Graph and Aggregated Static Graph.

7_Lifelog Data-Based Prediction Model of Digital Health Care App Customer Churn: Retrospective Observational Study

  • SCI 1区 2021年 期刊: 《JOURNAL OF MEDICAL INTERNET RESEARCH》 2021年1月录用 SCI 医学:信息版 1区 医学版 2区 影响因子:4.752

  • 链接:https://pan.baidu.com/s/198MNlU9M6W3CAxVp0QvUmw
    提取码:q0f8

  • 摘要

Background: Customer churn is the rate at which customers stop doing business with an entity. In the field of digital health care, user churn prediction is important not only in terms of company revenue but also for improving the health of users. Churn prediction
has been previously studied, but most studies applied time-invariant model structures and used structured data. However, additional unstructured data have become available; therefore, it has become essential to process daily time-series log data for churn predictions.
Objective: We aimed to apply a recurrent neural network structure to accept time-series patterns using lifelog data and text message data to predict the churn of digital health care users. Methods: This study was based on the use data of a digital health care app that provides interactive messages with human coaches regarding food, exercise, and weight logs. Among the users in Korea who enrolled between January 1, 2017 and January 1, 2019, we defined churn users according to the following criteria: users who received a refund before the paid program ended and users who received a refund 7 days after the trial period. We used long short-term memory with a masking layer to receive sequence data with different lengths. We also performed topic modeling to vectorize text messages. To interpret the contributions of each variable to model predictions, we used integrated gradients, which is an attribution method.
Results: A total of 1868 eligible users were included in this study. The final performance of churn prediction was an F1 score of 0.89; that score decreased by 0.12 when the data of the final week were excluded (F1 score 0.77). Additionally, when text data were included, the mean predicted performance increased by approximately 0.085 at every time point. Steps per day had the largest contribution (0.1085). Among the topic variables, poor habits (eg, drinking alcohol, overeating, and late-night eating) showed the largest contribution (0.0875).
Conclusions: The model with a recurrent neural network architecture that used log data and message data demonstrated high performance for churn classification. Additionally, the analysis of the contribution of the variables is expected to help identify signs of user churn in advance and improve the adherence in digital health care.

三、总结

  1. 数据集一般使用IBM 团队提供的 Telco customer churn
  2. Telco customer churn 与人工智能相结合的科研型论文较少,偏应用型
  • 2
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值