从单车销售数据看用户（二）

最新推荐文章于 2021-03-24 09:05:54 发布

简雨儿

最新推荐文章于 2021-03-24 09:05:54 发布

阅读量255

点赞数

分类专栏： python学习笔记 case学习笔记文章标签： mysql python

本文链接：https://blog.csdn.net/m0_49915249/article/details/114675194

版权

case学习笔记同时被 2 个专栏收录

10 篇文章 1 订阅

订阅专栏

python学习笔记

5 篇文章 0 订阅

订阅专栏

前言

案例总结分为两篇
第一篇：分析脑图及分析报告
第二篇：python数据处理+powerBI制图

import pandas as pd
import numpy as np

columns=['user_id','order_dt','order_products','order_amount']
df=pd.read_table('bicycle_master.txt',names=columns, sep='\s+')

一、数据描述统计

1. 数据展示

user_id：用户ID

order_dt:购买日期
order_products：购买产品数
order_amount：购买金额

df.head()

	user_id	order_dt	order_products	order_amount
0	1	19970101	1	11.77
1	2	19970112	1	12.00
2	2	19970112	5	77.00
3	3	19970102	2	20.76
4	3	19970330	2	20.76

2.描述统计

订单量平均2.4大部分订单消费了少量商品，有一定极值干扰
用户的消费金额比较稳定，平均消费35元，中位数在25元，中位数小于平均数，呈现右偏分布，有一定极值的干扰

df.describe()

	user_id	order_dt	order_products	order_amount
count	69659.000000	6.965900e+04	69659.000000	69659.000000
mean	11470.854592	1.997228e+07	2.410040	35.893648
std	6819.904848	3.837735e+03	2.333924	36.281942
min	1.000000	1.997010e+07	1.000000	0.000000
25%	5506.000000	1.997022e+07	1.000000	14.490000
50%	11410.000000	1.997042e+07	2.000000	25.980000
75%	17273.000000	1.997111e+07	3.000000	43.700000
max	23570.000000	1.998063e+07	99.000000	1286.010000

二、月度用户消费趋势

1.数据预处理

改变order_dt数据类型，astype提取年月，增加month列

df['order_dt']=pd.to_datetime(df.order_dt,format='%Y%m%d')
df['month']=df.order_dt.values.astype('datetime64[M]')

df.head()

	user_id	order_dt	order_products	order_amount	month
0	1	1997-01-01	1	11.77	1997-01-01
1	2	1997-01-12	1	12.00	1997-01-01
2	2	1997-01-12	5	77.00	1997-01-01
3	3	1997-01-02	2	20.76	1997-01-01
4	3	1997-03-30	2	20.76	1997-03-01

2.每月消费总额，消费次数，产品购买量，消费人数

每月消费总额，消费次数，产品购买量

df_month=df.groupby('month')
month_info=df_month[['order_amount','user_id','order_products']].agg({'order_amount':sum,'user_id':'count','order_products':sum})
month_info=month_info.rename(columns={'order_amount':'消费总额','user_id':'消费次数','order_products':'产品购买量'})

消费人数

month_info['消费人次']=df_month.user_id.unique().map(len)
month_info

	消费总额	消费次数	产品购买量	消费人次
month
1997-01-01	299060.17	8928	19416	7846
1997-02-01	379590.03	11272	24921	9633
1997-03-01	393155.27	11598	26159	9524
1997-04-01	142824.49	3781	9729	2822
1997-05-01	107933.30	2895	7275	2214
1997-06-01	108395.87	3054	7301	2339
1997-07-01	122078.88	2942	8131	2180
1997-08-01	88367.69	2320	5851	1772
1997-09-01	81948.80	2296	5729	1739
1997-10-01	89780.77	2562	6203	1839
1997-11-01	115448.64	2750	7812	2028
1997-12-01	95577.35	2504	6418	1864
1998-01-01	76756.78	2032	5278	1537
1998-02-01	77096.96	2026	5340	1551
1998-03-01	108970.15	2793	7431	2060
1998-04-01	66231.52	1878	4697	1437
1998-05-01	70989.66	1985	4903	1488
1998-06-01	76109.30	2043	5287	1506

3.powerBI作图

更改数据格式类型,存为excel导入powebi作图
更改powerbi需要字符串类型数据
取消索引

month_info=month_info.reset_index()
month_info.head()

	month	消费总额	消费次数	产品购买量	消费人次
0	1997-01-01	299060.17	8928	19416	7846
1	1997-02-01	379590.03	11272	24921	9633
2	1997-03-01	393155.27	11598	26159	9524
3	1997-04-01	142824.49	3781	9729	2822
4	1997-05-01	107933.30	2895	7275	2214

month_info['month']=month_info['month'].astype(str)

month_info.to_excel(r'.\月销售额、销售次数、产品购买量、消费人数.xlsx')

在这里插入图片描述

消费总额在前三个月达到最高峰，后续消费较为稳定，有轻微下降趋势
产品购买量在前三个月达到最高峰，后续消费较为稳定，有轻微下降趋势
前三个月消费次数在10000笔左右，达到高峰，后续消费人数稳定，有轻微下降趋势
前三个月消费订单人数在8000-10000笔左右，达到高峰，后续消费人数稳定，有轻微下降趋势

三、用户个体消费分析

用户消费金额和消费次数散点图

#用数据透视表的方式来实现
user_info=df.pivot_table(index='user_id',
              values=['order_products','order_amount'],
              aggfunc={'order_products':'sum',
                      'order_amount':'sum'})

user_info.to_excel(r'.\用户个体消费行为分析.xlsx')

在这里插入图片描述

排除离群点影响，消费产品越多消费金额越多，呈正向相关关系符合先验知识

用户消费金额分布图(柱状图)
对消费总额order_amount划分区间，再画柱状图

user_info.order_amount

Name: order_amount, Length: 23570, dtype: float64

amount=user_info.order_amount
amount_1st=[i for i in range(0,int(amount.max())+50,50)]
#划分标签bins
amount_1st

Name: order_amount, Length: 23570, dtype: category
Categories (280, int64): [50 < 100 < 150 < 200 ... 13850 < 13900 < 13950 < 14000]

amount_cut.to_excel(r'.\消费金额分布直方图.xlsx')

在这里插入图片描述

通过直方图，大部分用户的消费金额较低

用户消费次数分布图
同上，对消费次数划分区间

products=user_info.order_products
#划分标签bins,划分区间长度为5
products_1st=[i for i in range(0,int(products.max())+5,5)]
products_cut=pd.cut(products,bins=products_1st,labels=products_1st[1:])
products_cut.to_excel(r'.\消费次数分布直方图.xlsx')

在这里插入图片描述

在排除异常值的影响后，大部分用户消费频次不高

用户累计消费金额占比
画累加分布图，寻找核心用户

amount=user_info.order_amount
amount=amount.to_frame()
type(amount)
amount=amount.sort_values('order_amount')#排序

user_cumsum=amount.apply(lambda x:x.cumsum()/x.sum())

user_cumsum.reset_index().to_excel(r'.\累积销售.xlsx')

在这里插入图片描述

中值线：50%的用户的消费金额只占了10%
20%的用户：23569*20%=4713人，消费金额占比在横轴18856处，占比在1-0.3=70%
也就是说20%的用户占据了70%的消费金额，类比于28法则的情势，要维护好这20%的用户

四、用户消费行为分析

1.用户首购——新用户

grouped_user=df.groupby('user_id')

grouped_user.min().order_dt.value_counts()
#计算在某一个日期首购的用户数量

Name: order_dt, Length: 84, dtype: int64

grouped_user_min=grouped_user.min().order_dt.value_counts().reset_index().rename(columns={'index':'first_date'})

grouped_user_min['first_date'] =grouped_user_min['first_date'].astype(str)
grouped_user_min.to_excel(r'.\用户首购.xlsx')

在这里插入图片描述

按照月份来看，新用户首次购买时间在1，2，3月，用户首次购买次数在2月份数量达到最大，2月份的新用户最为活跃
在这里插入图片描述

按照日来看，1月份每日用户新增并首次购买人数呈上升趋势

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W0NVyZCq-1615454948715)(attachment:image.png)]

2月份呈现较大的波动

在这里插入图片描述

3月份用户首次购买呈现下降趋势，新用户不再增长

2.最后一次购买——用户流失

grouped_user_max=grouped_user.max().order_dt.value_counts().reset_index().rename(columns={'index':'last_date'})
grouped_user_max['last_date'] =grouped_user_max['last_date'].astype(str)
grouped_user_max.to_excel(r'.\用户最后一次购买.xlsx')

在这里插入图片描述

按季度分析，1997年用户断崖式流失，98年度数据有上升趋势

3.用户生命周期

3.1一次消费分析

user_life=grouped_user.order_dt.agg(['min','max'])
(user_life['min']==user_life['max']).value_counts()

True     12054
False    11516
dtype: int64

有一半的用户消费一次

3.2每月新客,老客

#按照月份和userid对数据进行分组，在计算日期最初购买时间和最后购买时间，如果相同则是新用户
user_um=df.groupby(['user_id','month']).order_dt.agg(['min','max'])
user_um.head(10)

		min	max
user_id	month
1	1997-01-01	1997-01-01	1997-01-01
2	1997-01-01	1997-01-12	1997-01-12
3	1997-01-01	1997-01-02	1997-01-02
	1997-03-01	1997-03-30	1997-03-30
	1997-04-01	1997-04-02	1997-04-02
	1997-11-01	1997-11-15	1997-11-25
	1998-05-01	1998-05-28	1998-05-28
4	1997-01-01	1997-01-01	1997-01-18
	1997-08-01	1997-08-02	1997-08-02
	1997-12-01	1997-12-12	1997-12-12

TF=(user_um['min']==user_um['max'])
Result=TF.groupby('month').value_counts()
Result

month            
1997-01-01  True     7093
            False     753
1997-02-01  True     8571
            False    1062
1997-03-01  True     8154
            False    1370
1997-04-01  True     2228
            False     594
1997-05-01  True     1801
            False     413
1997-06-01  True     1912
            False     427
1997-07-01  True     1743
            False     437
1997-08-01  True     1450
            False     322
1997-09-01  True     1410
            False     329
1997-10-01  True     1489
            False     350
1997-11-01  True     1654
            False     374
1997-12-01  True     1493
            False     371
1998-01-01  True     1242
            False     295
1998-02-01  True     1264
            False     287
1998-03-01  True     1627
            False     433
1998-04-01  True     1175
            False     262
1998-05-01  True     1209
            False     279
1998-06-01  True     1212
            False     294
dtype: int64

3.3用户购买周期

按照user_id分组分析，相邻两次购买时间间隔

3.3.1数据统计

grouped_user

#计算统一用户两个订单的间隔时间
orderdt_diff=grouped_user.apply(lambda x:x.order_dt-x.order_dt.shift())
orderdt_diff.head()

user_id   
1        0       NaT
2        1       NaT
         2    0 days
3        3       NaT
         4   87 days
Name: order_dt, dtype: timedelta64[ns]

#描述统计
orderdt_diff.describe()

count                      46089
mean     68 days 23:22:13.567662
std      91 days 00:47:33.924168
min              0 days 00:00:00
25%             10 days 00:00:00
50%             31 days 00:00:00
75%             89 days 00:00:00
max            533 days 00:00:00
Name: order_dt, dtype: object

用户平均购买时间间隔在68天
用户最小购买时间间隔为0表示的是用户只有1次购买

3.3.2powerBI作图

#改便order_difft的数据类型，去掉days单位
diff_info=orderdt_diff/np.timedelta64(1,'D')
#直方图划分x时间轴分组bins=10，diff_info.max()最大时间间隔
diff_info_bins=[i for i in range(0,int(diff_info.max())+1,10)]
diff_info_bins#去掉0使用

#diff_info分组数据
diff_info_hist=pd.cut(diff_info,bins=diff_info_bins,labels=diff_info_bins[1:])

#处理空数据，空数据就是初次购买时间记为bins开始10
diff_info_hist1=diff_info_hist.fillna(10)
diff_info_hist1.to_excel(r'.\用户购买周期时间差频率直方图.xlsx')

在这里插入图片描述

大部分用户购买周期都小于100天，平均购买周期为68天，为了提高用户留存可以在60-90天的时间间隔做相应的运营活动，增加用户粘性
0-10区间用户很多，是因为在数据处理的时候将1次购买用户计入该组，由前面的分析，1次购买用户数量很多，因此会影响整个购买周期分布，所以需要剔除1次购买用户进行垢面周期分布绘图，再进一步分析

#hist数据与hist1数据相比没有执行fillna（0）
diff_info_hist.to_excel(r'.\用户购买周期时间差频率直方图(排除1次购买).xlsx')

在这里插入图片描述

分布图形并没有太大变化，数据分组更均匀

3.4用户生命周期

第一次购买和最后一次购买时间间隔，使用4.3.1的user_life

(user_life['max']-user_life['min']).describe()

count                       23570
mean     134 days 20:55:36.987696
std      180 days 13:46:43.039788
min               0 days 00:00:00
25%               0 days 00:00:00
50%               0 days 00:00:00
75%             294 days 00:00:00
max             544 days 00:00:00
dtype: object

用户生命周期平均在134天，在134天用户就有很大的流失风险
可以看到50%的用户只有1次购买，因此分析生命周期分布时排除1次购买

user_life_info = ((user_life['max']-user_life['min'])/np.timedelta64(1,"D"))
user_life_bins = [i for i in range(0,int(user_life_info.max())+1,10)]

user_life_info_hist = pd.cut(user_life_info,bins=user_life_bins,labels=user_life_bins[1:])

user_life_info_hist.to_excel(r'.\用户生命周期(忽略一次购买).xlsx')

在这里插入图片描述

忽略1次购买用户，用户生命周期呈现驼峰分布的形势，用户购买的时间间隔较长，这也与单车的商品性质有关

五、用户分层

1.R、F、M

R ：消费时间 F：消费金额 M：消费频次
使用如下参数定义：RFM
R：最后一次消费距此次消费时间，F：消费总金额，M：消费总产品数

#改造时间dt结构，得到R列
rfm['R']=(rfm.order_dt-rfm.order_dt.max())/np.timedelta64(1,'D')

#改amount为F，products为M
rfm=rfm.rename(columns={'order_products':'M','order_amount':'F'})

1.1构造RFM指数对用户分层

指数划分标准1：

R-R_mean()>1：表示现在时间与最后一次购买时间间隔小于平均间隔水平，用户活跃，R重要指数水平记为1
F-F_mean()>1：表示消费金额大于均值，记重要指数水平为1
M-M_mean()>1：表示消费频次大于均值，记重要指数水平为1
分层标准2：
‘111’:‘重要价值客户’,
‘011’:‘重要保持客户’,
‘101’:‘重要发展客户’,
‘001’:‘重要挽留客户’,
‘110’:‘一般价值客户’,
‘010’:‘一般保持客户’,
‘100’:‘一般发展客户’,
‘000’:‘一般挽留客户’
分层标准3（作图用）：
‘111’:‘重要价值客户’
其他:‘非重要价值客户’

1.2每个用户按重要性分层

#写rfm分层函数,打分层标签，使用字典保存中文
def rfm_func(x):
    level=x.apply(lambda x:'1' if x>0 else '0')
    label=level.R+level.F+level.M
    d={
        '111':'重要价值客户',
        '011':'重要保持客户',
        '110':'重要发展客户',
        '010':'重要挽留客户',
        '101':'一般价值客户',
        '001':'一般保持客户',
        '100':'一般发展客户',
        '000':'一般挽留客户',
    }
    return d[label]#索引输出中文标签

#对rfm逐行操作,对每个用户进行分层
rfm['label']=rfm[['R','F','M']].apply(lambda x:x-x.mean()).apply(rfm_func,axis=1)

1.3PowerbI分层图

rfm.loc[rfm.label=='重要价值客户','color']='重要价值客户'
rfm.loc[~(rfm.label=='重要价值客户'),'color']='非重要价值客户'

rfm

rfm.to_excel(r'.\RFM模型.xlsx')

在这里插入图片描述

1.4数据分类汇总

rfm.groupby('label').sum()

rfm.groupby('label').count()

	F	order_dt	M	R	color
label
一般价值客户	206	206	206	206	206
一般保持客户	77	77	77	77	77
一般发展客户	3300	3300	3300	3300	3300
一般挽留客户	14074	14074	14074	14074	14074
重要价值客户	4554	4554	4554	4554	4554
重要保持客户	787	787	787	787	787
重要发展客户	331	331	331	331	331
重要挽留客户	241	241	241	241	241

从用户分层计数来看大部分用户为重要用户，一般挽留用户所占比列最高，结合具体业务对一般挽留用户做好用户留存，警惕用户流失

2.新客，活跃，回流，流失分层

2.1分层

#每月消费计数，填充空值为0
pivoted_counts=df.pivot_table(index='user_id',columns='month',values='order_dt',aggfunc='count').fillna(0)

#转化一下数字位数
df_purchase=pivoted_counts.applymap(lambda x:1 if x>0 else 0)
df_purchase.head()

month	1997-01-01 00:00:00	1997-02-01 00:00:00	1997-03-01 00:00:00	1997-04-01 00:00:00	1997-05-01 00:00:00	1997-06-01 00:00:00	1997-07-01 00:00:00	1997-08-01 00:00:00	1997-09-01 00:00:00	1997-10-01 00:00:00	1997-11-01 00:00:00	1997-12-01 00:00:00	1998-01-01 00:00:00	1998-02-01 00:00:00	1998-03-01 00:00:00	1998-04-01 00:00:00	1998-05-01 00:00:00	1998-06-01 00:00:00
user_id
1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	1	0	1	1	0	0	0	0	0	0	1	0	0	0	0	0	1	0
4	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	0
5	1	1	0	1	1	1	1	0	1	0	0	1	1	0	0	0	0	0

df_purchase.shape

(23570, 18)

#新客new，活跃active，回流return，流失或未注册unreg划分函数
def status(data):
    #状态列表
    status=[]
    #18行数据，逐行（一个id的一条数据）判断
    for i in range(18):
        if data[i]==0:
            if len(status)>0:
                if status[i-1]=='unreg':
                    status.append('unreg')
                else:
                    status.append('unactive')
            else:
                status.append('unreg')
        else:
            if len(status)==0:
                status.append('new')
            else:
                if status[i-1]=='unactive':
                    status.append('return')
                elif status[i-1]=='unreg':
                    status.append('new')
                else:
                    status.append('active')
    return status

#对df_purchaes()每一列使用status函数进行状态划分
#pd.Series([list],index=[list])
lambda x:pd.Series(status(x),index=df_purchase.columns)

<function __main__.<lambda>(x)>

purchase_stats=df_purchase.apply(lambda x: pd.Series(status(x),index=df_purchase.columns),axis=1)
purchase_stats.head()

month	1997-01-01 00:00:00	1997-02-01 00:00:00	1997-03-01 00:00:00	1997-04-01 00:00:00	1997-05-01 00:00:00	1997-06-01 00:00:00	1997-07-01 00:00:00	1997-08-01 00:00:00	1997-09-01 00:00:00	1997-10-01 00:00:00	1997-11-01 00:00:00	1997-12-01 00:00:00	1998-01-01 00:00:00	1998-02-01 00:00:00	1998-03-01 00:00:00	1998-04-01 00:00:00	1998-05-01 00:00:00	1998-06-01 00:00:00
user_id
1	new	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive
2	new	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive	unactive
3	new	unactive	return	active	unactive	unactive	unactive	unactive	unactive	unactive	return	unactive	unactive	unactive	unactive	unactive	return	unactive
4	new	unactive	unactive	unactive	unactive	unactive	unactive	return	unactive	unactive	unactive	return	unactive	unactive	unactive	unactive	unactive	unactive
5	new	active	unactive	return	active	active	active	unactive	return	unactive	unactive	return	active	unactive	unactive	unactive	unactive	unactive

#统计活跃，新客，回流，不活跃人数
fenceng2=purchase_stats.apply(lambda x:pd.value_counts(x)).T.fillna(0)
fenceng2

	active	new	return	unactive	unreg
month
1997-01-01	0.0	7846.0	0.0	0.0	15724.0
1997-02-01	1157.0	8476.0	0.0	6689.0	7248.0
1997-03-01	1681.0	7248.0	595.0	14046.0	0.0
1997-04-01	1773.0	0.0	1049.0	20748.0	0.0
1997-05-01	852.0	0.0	1362.0	21356.0	0.0
1997-06-01	747.0	0.0	1592.0	21231.0	0.0
1997-07-01	746.0	0.0	1434.0	21390.0	0.0
1997-08-01	604.0	0.0	1168.0	21798.0	0.0
1997-09-01	528.0	0.0	1211.0	21831.0	0.0
1997-10-01	532.0	0.0	1307.0	21731.0	0.0
1997-11-01	624.0	0.0	1404.0	21542.0	0.0
1997-12-01	632.0	0.0	1232.0	21706.0	0.0
1998-01-01	512.0	0.0	1025.0	22033.0	0.0
1998-02-01	472.0	0.0	1079.0	22019.0	0.0
1998-03-01	571.0	0.0	1489.0	21510.0	0.0
1998-04-01	518.0	0.0	919.0	22133.0	0.0
1998-05-01	459.0	0.0	1029.0	22082.0	0.0
1998-06-01	446.0	0.0	1060.0	22064.0	0.0

fenceng2.index=fenceng2.index.astype(str)
fenceng2.to_excel(r'.\用户分层-新、活跃、流失、回流.xlsx')

在这里插入图片描述

钻取季度数据

2.2数据汇总

fenceng2_rate=fenceng2.apply(lambda x:x/x.sum(),axis=1)
fenceng2_rate

	active	new	return	unactive	unreg
month
1997-01-01	0.000000	0.332881	0.000000	0.000000	0.667119
1997-02-01	0.049088	0.359610	0.000000	0.283793	0.307510
1997-03-01	0.071319	0.307510	0.025244	0.595927	0.000000
1997-04-01	0.075223	0.000000	0.044506	0.880272	0.000000
1997-05-01	0.036148	0.000000	0.057785	0.906067	0.000000
1997-06-01	0.031693	0.000000	0.067543	0.900764	0.000000
1997-07-01	0.031650	0.000000	0.060840	0.907510	0.000000
1997-08-01	0.025626	0.000000	0.049555	0.924820	0.000000
1997-09-01	0.022401	0.000000	0.051379	0.926220	0.000000
1997-10-01	0.022571	0.000000	0.055452	0.921977	0.000000
1997-11-01	0.026474	0.000000	0.059567	0.913958	0.000000
1997-12-01	0.026814	0.000000	0.052270	0.920916	0.000000
1998-01-01	0.021723	0.000000	0.043487	0.934790	0.000000
1998-02-01	0.020025	0.000000	0.045779	0.934196	0.000000
1998-03-01	0.024226	0.000000	0.063174	0.912601	0.000000
1998-04-01	0.021977	0.000000	0.038990	0.939033	0.000000
1998-05-01	0.019474	0.000000	0.043657	0.936869	0.000000
1998-06-01	0.018922	0.000000	0.044972	0.936105	0.000000

活跃用户active不断降低
新用户只有前三个月有新增，新用户吸引方面有待改善
用户回流维持在一个降低的稳定水平，没有做好用户的维持
未注册用户与新用户比较来看，未注册和新用户出前三个月都是0，说明需要进行一些活动运营吸引用户，触及更多的潜在用户

六、复购率与回购率

复购率：自然月内，购买多次的用户占比
回购率：不同时间窗内，曾经购买的用户再次购买的占比

1.复购率

pivoted_counts#购买次数表

#记多于1次购买的为1,1次购买和0次购买为0
pivoted_counts=pivoted_counts.applymap(lambda x:1 if x>1 else np.NaN if x==0 else 0)

pivoted_counts.sum()

pivoted_counts.count()

reshop=pd.DataFrame(pivoted_counts.sum()/pivoted_counts.count())
reshop=reshop.reset_index()
reshop['month']=reshop['month'].astype(str)
reshop

reshop.to_excel(r'.\复购人数与总消费人数比例.xlsx')

在这里插入图片描述

1月大量新用户涌入，后期复购率维持在20%

2.回购率

df_purchase#购买行为表1为本月购买，0为本月不购买

回购率函数——1个月为时间窗
- 本月购买，下个月购买：回购客户status记为1
- 本月购买，下个月没有购买，不是回购客户status记为0
- 遍历18列（1行一个userid一条18个月份的数据）18列最后一个月没有下个月status记为空

def back(data):
    status=[]
    for i in range(17):
        if data[i]==1:
            if data[i+1]==1:
                status.append(1)
            if data[i+1]==0:
                status.append(0)
        else:
            status.append(np.NaN)
    #18月
    status.append(np.NaN)
    return status

index_col=df_purchase.columns.astype('str')
index_col

Index(['1997-01-01', '1997-02-01', '1997-03-01', '1997-04-01', '1997-05-01',
       '1997-06-01', '1997-07-01', '1997-08-01', '1997-09-01', '1997-10-01',
       '1997-11-01', '1997-12-01', '1998-01-01', '1998-02-01', '1998-03-01',
       '1998-04-01', '1998-05-01', '1998-06-01'],
      dtype='object', name='month')

back_rate=df_purchase.apply(lambda x:pd.Series(back(x),index=index_col),axis=1)

back_rate.head()

back=pd.DataFrame(back_rate.sum()/back_rate.count())
back=back.reset_index()
#back['month']=back['month'].astype(str)
back.head()

back.to_excel(r'.\回购率.xlsx')

在这里插入图片描述

回购率平均在30%左右

后记：终于结束了，和变成顺序和脑图顺序有一点不一致，吃鸡腿去了

简雨儿

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录