分析背景
某电信公司市场部为了预防用户流失,收集了已经打好流失标签的用户数据。现在要对流失用户情况进行分析,找出哪些用户可能会流失?
理解数据
采集数据
本数据集描述了电信用户是否流失以及其相关信息,共包含7043条数据,共21个字段,分别介绍如下:
- customerID : 用户ID。
- gender:性别。(Female & Male)
- SeniorCitizen :老年用户 (1表示是,0表示不是)
- Partner :伴侣用户 (Yes or No)
- Dependents :亲属用户 (Yes or No)
- tenure : 在网时长(0-72月)
- PhoneService : 是否开通电话服务业务 (Yes or No)
- MultipleLines: 是否开通了多线业务(Yes 、No or No phoneservice 三种)
- InternetService:是否开通互联网服务 (No, DSL数字网络,fiber optic光纤网络 三种)
- OnlineSecurity:是否开通网络安全服务(Yes,No,No internetserive 三种)
- OnlineBackup:是否开通在线备份业务(Yes,No,No internetserive 三种)
- DeviceProtection:是否开通了设备保护业务(Yes,No,No internetserive 三种)
- TechSupport:是否开通了技术支持服务(Yes,No,No internetserive 三种)
- StreamingTV:是否开通网络电视(Yes,No,No internetserive 三种)
- StreamingMovies:是否开通网络电影(Yes,No,No internetserive 三种)
- Contract:签订合同方式 (按月,一年,两年)
- PaperlessBilling:是否开通电子账单(Yes or No)
- PaymentMethod:付款方式(bank transfer,credit card,electronic check,mailed check)
- MonthlyCharges:月费用
- TotalCharges:总费用
- Churn:该用户是否流失(Yes or No)
导入数据
import pandas as pd
df=pd.read_csv(r"D:\PycharmProjects\ku_pandas\WA_Fn-UseC_-Telco-Customer-Churn.csv")
df.head(5) #显示数据前n行,不指定n,df.head则会显示所有的行
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | ... | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | ... | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | ... | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.5 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | ... | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | ... | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | ... | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
5 rows × 21 columns
查看数据
df.shape #显示数据的格式
(7043, 21)
df.dtypes #输出每一列对应的数据类型
customerID object
gender object
SeniorCitizen int64
Partner object
Dependents object
tenure int64
PhoneService object
MultipleLines object
InternetService object
OnlineSecurity object
OnlineBackup object
DeviceProtection object
TechSupport object
StreamingTV object
StreamingMovies object
Contract object
PaperlessBilling object
PaymentMethod object
MonthlyCharges float64
TotalCharges object
Churn object
dtype: object
df.columns #显示全部的列名
Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
dtype='object')
df.columns.tolist() #使用tolist()函数转化为list
['customerID',
'gender',
'SeniorCitizen',
'Partner',
'Dependents',
'tenure',
'PhoneService',
'MultipleLines',
'InternetService',
'OnlineSecurity',
'OnlineBackup',
'DeviceProtection',
'TechSupport',
'StreamingTV',
'StreamingMovies',
'Contract',
'PaperlessBilling',
'PaymentMethod',
'MonthlyCharges',
'TotalCharges',
'Churn']
type(df.columns.tolist())
list