数据驱动分析实践四客户留存分析

最新推荐文章于 2024-03-05 08:48:58 发布

置顶

Magic Ktwc37

最新推荐文章于 2024-03-05 08:48:58 发布

阅读量1.7k

点赞数

分类专栏：金融模型文章标签： python 聚类分析逻辑回归分类模型留存分析

本文链接：https://blog.csdn.net/weixin_43171270/article/details/122760395

版权

本文通过Telco数据集，运用探索性数据分析、特征工程、逻辑回归和XGBoost建立客户留存预测模型。发现性别、互联网服务类型、合同时间、技术支持情况和支付方式等因素影响客户流失。特征工程中使用了聚类分析，模型在测试集上达到78%的准确率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

数据驱动分析

实践四客户留存分析

在本系列前三篇文章中，我们已经定义了我们的指标、进行了客户分段并且建立机器学习模型来预测客户的LTV。由于我们通过客户分段和LTV预测知道了谁是我们最好的客户，我们应该尽力去留住这些客户。在这种情况下，客户留存率就成为了一个非常重要的指标值得我们去深入研究和分析。

留存率(Retention Ratio)指示了产品的市场适合度(Product Market Fit, PMF)。如果PMF没有满足，客户会迅速流失。改善留存率的一个强大的工具就是客户留存预测分析(Churn Analysis)。通过这种分析方法，你可以很轻易地了解在一个给定期间内哪些客户比较容易流失。

本篇文章，我们使用Telco数据集来开发一个留存预测模型，主要步骤包括：

探索式数据分析(Exploratory Data Analysis, EDA)
特征工程
使用逻辑回归来调查各个特征是如何影响留存的
使用XGBoost来建立一个分类模型

探索式数据分析（Exploratory Data Analysis）

import pandas as pd
%matplotlib inline
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import xgboost as xgb

df_data = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df_data.head()

在这里插入图片描述

df_data.info()

在这里插入图片描述
此数据集的特征可以分为两类：

分类特征：gender,streaming tv,payment method等
数值特征：tenure,monthly charges,total charges等

我们首先调查以下分类特征对客户流失的预测能力。

df_data['Churn'] = np.where(df_data['Churn']=='No',0,1)

性别 (gender)

import chart_studio.plotly as py
import plotly.offline as pyoff
import plotly.graph_objs as go
pyoff.init_notebook_mode()

def featureEDA(feature):
    df_plot = df_data.groupby(feature).Churn.mean().reset_index()
    plot_data = [
         go.Bar(
        x=df_plot[feature],
        y=df_plot['Churn'],
        width = [0.5, 0.5],
        marker=dict(
        color=['green', 'blue','orange','red'])
        )
    ]

    plot_layout = go.Layout(
        xaxis={
   "type": "category"},
        yaxis={
   "title": "Churn Rate"},
        title=feature,
        plot_bgcolor  = 'rgb(243,243,243)',
        paper_bgcolor  = 'rgb(243,243,243)',
        )
    fig = go.Figure(data=plot_data, layout=plot_layout)
    pyoff.iplot(fig)

featureEDA('gender')

在这里插入图片描述

df_data.groupby('gender').Churn.mean()

gender
Female 0.269209
Male 0.261603
Name: Churn, dtype: float64

相对于男性客户，女性客户更容易流失，但是这个差别非常小。

下面我们对所有的分类特征都做类似的调研：

Internet Service

featureEDA('InternetService')

在这里插入图片描述
上图显示拥有光线入网的互联网服务的客户的流失率更高。

Contract

featureEDA('Contract')

在这里插入图片描述
正如所料，合同时间较短的更容易流失。

Tech Support

featureEDA('TechSupport')

在这里插入图片描述
没有使用技术支持服务的客户更容易流失，差了25%。

支付方式

featureEDA('PaymentMethod')

在这里插入图片描述
支付自动化更容易留住客户。

其他分类特征

from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=2, start_cell="bottom-left",
                    subplot_titles=("PhoneService", "DeviceProtection","StreamingTV","PaperlessBilling"))
df_plot1 = df_data.groupby('PhoneService').Churn.mean().reset_index()
df_plot2 = df_data.groupby('DeviceProtection').Churn.mean().reset_index()
df_plot3 = df_data.groupby('StreamingTV').Churn.mean().reset_index()
df_plot4 = df_data.groupby('PaperlessBilling').Churn.mean().reset_index()
fig.add_trace(
         go.Bar(
        x=df_plot1['PhoneService'],
        y=df_plot1['Churn'],
        width = [0.5, 0.5],
        name="Phone Service",
        marker=dict(
        color=['green', 'blue','orange','red'])
        ),
    row=1,col=1
)
fig.add_trace(
         go.Bar(
        x=df_plot2['DeviceProtection'],
        y=df_plot2['Churn'],
        width = [0.5, 0.5],
        name="Device Protection",
        marker=dict(
        color=['green', 'blue','orange','red']