分群思维（五）特殊的分群思维-同期群分析

最新推荐文章于 2024-08-15 09:47:08 发布

HsuHeinrich

最新推荐文章于 2024-08-15 09:47:08 发布

阅读量1.6w

点赞数

分类专栏：数据分析文章标签： python 数据分析

本文链接：https://blog.csdn.net/weixin_39293132/article/details/129583529

版权

数据分析专栏收录该内容

64 篇文章 43 订阅

订阅专栏

分群思维（五）特殊的分群思维-同期群分析

小P：小H，用户留存率降了，增长也缓慢了，这是什么原因啊，会不会是新用户出了问题啊，还是说老用户不满意了？

小H：可以尝试同期群分析，看看新老用户的差异。

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 初始化设置
plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号

以下数据如果有需要的同学可关注公众号HsuHeinrich，回复【分群思维05】自动获取～

df = pd.read_csv('paid.csv', encoding="gbk")
df.head()

	日期	付费金额	uid
0	2021/4/30 9:50	300	9734668
1	2021/4/30 9:42	150	9947799
2	2021/4/30 9:41	680	9058431
3	2021/4/30 9:32	30	9947799
4	2021/4/30 9:25	150	2412798

cohort

# 生成用户每月数据
df['购买月份'] = pd.to_datetime(df['日期']).dt.to_period("M")
order = df.groupby(["uid", "购买月份"], as_index=False).agg(
    月付费总额=("付费金额", "sum"),
    月付费次数=("uid", "count"),
)

# 计算同期群分组：用户首次购买月份
order["同期群分组"] = order.groupby("uid")['购买月份'].transform("min")
# 计算cohort月
order["cohort月"] = (order.购买月份-order.同期群分组).apply(lambda x:f"+{x.n}月")

常见的cohort展示方式有两种，一种是按照日期差呈现出左上角数据；一种是按照实际日期呈现出右上角数据

# 留存cohort方式1
order.pivot_table(index="同期群分组", columns="cohort月",
                          values="uid", aggfunc="count",
                          fill_value=0).rename_axis(columns="留存用户")

# 留存cohort方式2
order.pivot_table(index="同期群分组", columns="购买月份",
                          values="uid", aggfunc="count",
                          fill_value=0).rename_axis(columns="留存用户")

# 按照cohort方式1计算各指标

# 生成用户留存cohort
user_retention_cohort = order.pivot_table(index="同期群分组", columns="cohort月",
                             values="uid", aggfunc="count",
                             fill_value=0).rename_axis(columns="留存用户")

# 生成用户留存率cohort
user_retention_rate_cohort = user_retention_cohort.divide(user_retention_cohort.iloc[:, 0], axis=0)
user_retention_rate_cohort.rename_axis(['留存率'], axis=1, inplace=True)

# 生成人均付费金额cohort
user_paid_total_cohort = order.pivot_table(index="同期群分组", columns="cohort月",
                                  values="月付费总额", aggfunc="sum",
                                  fill_value=0).rename_axis(columns="付费总额")
user_paid_per_amount_cohort = user_paid_total_cohort.divide(user_retention_cohort.iloc[:, 0], axis=0)
user_paid_per_amount_cohort.rename_axis(['人均付费金额'], axis=1, inplace=True)

# 生成人均购买次数cohort
user_paid_total_cohort = order.pivot_table(index="同期群分组", columns="cohort月",
                                  values="月付费次数", aggfunc="sum",
                                  fill_value=0).rename_axis(columns="付费总次数")
user_paid_per_cnt_cohort = user_paid_total_cohort.divide(user_retention_cohort.iloc[:, 0], axis=0)
user_paid_per_cnt_cohort.rename_axis(['人均付费次数'], axis=1, inplace=True)

# 优化表格 这里以用户留存率为例
(user_retention_rate_cohort.style
    .format("{:.2%}")
    .bar(subset="+0月", color="green")
    .background_gradient("Reds",subset=user_retention_rate_cohort.columns[1:],axis=None)
 )

留存堆积图

以下数据如果有需要的同学可关注公众号HsuHeinrich，回复【分群思维05】自动获取～

# 模拟的数据，部分数据较前期多可忽略
df_sp2 = pd.read_excel('留存cohort.xlsx', index_col=0)
df_sp2.head()

# 绘制留存堆积图
df_sp2.T.plot.area(stacked=True, figsize=(12,8))
plt.title('留存堆积图')
plt.show()

output_11_0

留存堆积图能很好的展示留存信息和增长信息，当新增用户显著高于前期的流失时，用户活跃规模是增长的。如上图，我们可以看到在2019.1月至6月用户呈现缓慢的增长，此时各月的流失也比较缓慢。之后直至8月迎来快速增长期，这段时期的新用户增长明显较大。此后到2022.1月新用户增增长略可，用户流失也略可，所以也呈现出了缓慢的增长，到2022.2月则开始下滑了。

共勉～