电信和邻居共享上网_与k个最近邻居的电信行业客户流失预测

最新推荐文章于 2024-07-12 19:06:42 发布

weixin_26704853

最新推荐文章于 2024-07-12 19:06:42 发布

阅读量257

点赞数

文章标签： python

原文链接：https://medium.com/@rajasubhare/telecom-industry-customer-churn-prediction-with-k-nearest-neighbor-1d5784952c45

版权

电信和邻居共享上网

问题描述 (Problem Description)

This blog aims to predict when a customer could probably churn based on the company’s data from the previous month, to offer those customers better services. This is a supervised learning problem. At the fundamental level, the tasks involved is to Load the dataset from IBM’S Watson Community’s Telecom Customer Churn dataset. This dataset contains multiple categorical variables and a few numerical variables. Since this is a supervised classification problem, we can apply a popular classification algorithm like Decision Tree, Logistic regression, SVM, Random Forest, and clustering. We have to preprocess this categorical data, and we run it through several algorithms, make predictions and note of Accuracy, Sensitivity, Specificity, and other measures. To Utilize the k-Nearest Neighbors (k-NN) algorithm to perform classification based on the distance parameter and variable selection.

该博客旨在根据上个月的公司数据预测客户何时可能流失，从而为这些客户提供更好的服务。这是有监督的学习问题。从根本上讲，所涉及的任务是从IBM的Watson Community的电信客户流失数据集中加载数据集。该数据集包含多个类别变量和一些数字变量。由于这是一个监督分类问题，因此我们可以应用流行的分类算法，例如决策树，逻辑回归，支持向量机，随机森林和聚类。我们必须对该分类数据进行预处理，然后通过几种算法对其进行处理，进行预测并记下准确性，敏感性，特异性和其他度量。利用k最近邻(k-NN)算法对距离参数和变量选择进行分类。

However, in this article, we are just going to concentrate on the k-NN algorithm.

但是，在本文中，我们将仅专注于k-NN算法。

数据集简介 (Introduction to dataset)

The name of the Data set is WA_Fn UseC_ Telco Customer Churn.csv. It is taken from IBM Watson Telecom customer churn Dataset https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/. This data set contains 7043 rows and 21 columns. The dataset does seem to have an imbalanced dataset with regards to Churn -Yes/No. There is a higher percentage of No data. The input data will be customers specifications and contract details such as, the customer is male or female, what kind of service he/she gets from the company, how he/she pays the bills, how often he/she pays the bill, is he/she senior citizen or not and so on. The output is a column of yes and no, which defines a customer keeps using the company services and pays or decides to leave the company. Customer churning is a classification problem since our output is a discrete type of data. The output variable, Churn value, takes the binary form as “ Yes” or” NO,” it will be categorized under classification problem in the supervised machine learning.

数据集的名称为WA_Fn UseC_ Telco客户Churn.csv。它来自IBM Watson Telecom客户流失数据集https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/ 。该数据集包含7043行和21列。关于Churn-是/否，该数据集似乎确实具有不平衡的数据集。没有数据的百分比更高。输入的数据将是客户的规格和合同详细信息，例如，客户是男性还是女性，他/她从公司获得哪种服务，他/她如何付款，他/她多久付款一次，他/她是否为老年人，等等。输出是“是”和“否”列，该列定义客户继续使用公司的服务并付款或决定离开公司。客户流失是一个分类问题，因为我们的输出是离散的数据类型。输出变量Churn值的二进制形式为“是”或“否”，在有监督的机器学习中，它将根据分类问题进行分类。

Since our data has many variables, we would need to take a prudent and informed decision based on different tools and analysis, such as histograms, box plots, etc. This process aims to identify the significant variables that apply the algorithms to get a suggestion on significant variables, calculate, accuracy, misclassification error, sen

最低0.47元/天解锁文章

weixin_26704853

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
电信和邻居共享上网_与k个最近邻居的电信行业客户流失预测

电信和邻居共享上网问题描述 (Problem Description)This blog aims to predict when a customer could probably churn based on the company’s data from the previous month, to offer those customers better services. This ...
复制链接

扫一扫