用户购买CD行为数据分析

%matplotlib inline
# 正常显示中文
from pylab import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
# 正常显示符号
matplotlib.rcParams['axes.unicode_minus']=False
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# 读取数据
columns = ['user_id', 'order_dt', 'order_products', 'order_amount']
date = pd.read_table('CD.txt', names=columns, sep='\s+', engine='python')
print(date.head())

# 将时间序列转化为时间格式
date['order_dt'] = pd.to_datetime(date['order_dt'], format='%Y%m%d')
# 转化为月份格式
date['month'] = date['order_dt'].values.astype('datetime64[M]')
   user_id  order_dt  order_products  order_amount
0        1  19970101               1         11.77
1        2  19970112               1         12.00
2        2  19970112               5         77.00
3        3  19970102               2         20.76
4        3  19970330               2         20.76
# 进行用户消费趋势的分析(按月)
group_month = date.groupby('month')
# 1. 每月的消费总金额
m_money = group_month['order_amount'].sum()
print(m_money.head())
plt.figure(figsize=(10,8), dpi=80)
plt.plot(m_money.index, m_money.values)
# m_money.plot()
plt.xticks(rotation=45)
plt.style.use('ggplot')
plt.show()
month
1997-01-01    299060.17
1997-02-01    379590.03
1997-03-01    393155.27
1997-04-01    142824.49
1997-05-01    107933.30
Name: order_amount, dtype: float64

在这里插入图片描述

从图中可以看出1997年前三个月销量比较高,随后一直到1998年消费下降并保持微小波动
# 2. 每月的消费次数
num_consume = group_month.count()['user_id']
print(num_consume.head())
plt.figure(figsize=(10,8), dpi=80)
plt.plot(num_consume.index, num_consume.values)
# m_money.plot()
plt.xticks(rotation=45)
plt.style.use('ggplot')
plt.show()
month
1997-01-01     8928
1997-02-01    11272
1997-03-01    11598
1997-04-01     3781
1997-05-01     2895
Name: user_id, dtype: int64

在这里插入图片描述

从上图可以看出,1997年前三个月有比较高的消费次数,三月达到最高,但随后一直到1998年消费次数下降并在一定范围内有些许波动
# 3. 每月的产品购买量
num_product = group_month['order_products'].sum()
print(num_product.head())
plt.figure(figsize=(10,8), dpi=80)
plt.plot(num_product.index, num_product.values)
# m_money.plot()
plt.xticks(rotation=45)
plt.style.use('ggplot')
plt.show()
month
1997-01-01    19416
1997-02-01    24921
1997-03-01    26159
1997-04-01     9729
1997-05-01     7275
Name: order_products, dtype: int64

在这里插入图片描述

# 4. 每月的消费人数
num_person = date.groupby(['month', 'user_id']).count()
num_person = num_person.reset_index('user_id').groupby('month').count()
print(num_person.head())
plt.figure(figsize=(10,8), dpi=80)
plt.plot(num_person.index, num_person.values)
# m_money.plot()
plt.xticks(rotation=45)
plt.style.use('ggplot')
plt.show()
            user_id  order_dt  order_products  order_amount
month                                                      
1997-01-01     7846      7846            7846          7846
1997-02-01     9633      9633            9633          9633
1997-03-01     9524      9524            9524          9524
1997-04-01     2822      2822            2822          2822
1997-05-01     2214      2214            2214          2214

在这里插入图片描述

每月消费人数低于每月消费次数,但差异不太大,前三个月每月的消费人数在8000-10000之间,随后平均消费人数为2000左右波动
date.pivot_table(index=['month'], values=['order_products', 'order_amount', 'user_id'],
                aggfunc={'order_products':'sum', 'order_amount':'sum', 'user_id':'count'}
                ).head()
order_amountorder_productsuser_id
month
1997-01-01299060.17194168928
1997-02-01379590.032492111272
1997-03-01393155.272615911598
1997-04-01142824.4997293781
1997-05-01107933.3072752895
avg_consume = date.pivot_table(index=['month'], values=['order_amount'],
                aggfunc='mean'
                )
# 5. 每月用户平均消费金额趋势
# count_consume = date.groupby('month')['order_amount'].count()
# all_consume = date.groupby('month')['order_amount'].sum()
# avg_consume = all_consume / count_consume

print(avg_consume.head())
plt.figure(figsize=(10,8), dpi=80)
plt.plot(avg_consume.index, avg_consume.order_amount)
# m_money.plot()
plt.xticks(rotation=45)
plt.style.use('ggplot')
plt.show()
            order_amount
month                   
1997-01-01     33.496883
1997-02-01     33.675482
1997-03-01     33.898540
1997-04-01     37.774263
1997-05-01     37.282660

在这里插入图片描述

通过上图可以发现,用户每月平均消费金额上下波动,有可能该产品属于一种间断性使用的物品
# 6. 每月用户平均消费次数的趋势
avg_count_consume = []
for i in range(len(num_consume)):
    avg_count_consume.append(num_consume.values[i] / num_person.values[i])

plt.figure(figsize=(10,8), dpi=80)
plt.plot(num_consume.index, avg_count_consume)
# m_money.plot()
plt.xticks(rotation=45)
plt.style.use('ggplot')
plt.show()

在这里插入图片描述

由上图可以看出,用户每月平均消费次数从1997/01开始到1997/03一直是快速增长的趋势,随后在用户每月平均消费次数在1.3次左右波动

用户个体消费分析

  • 用户消费金额、消费次数的描述统计
  • 用户消费金额和消费的次数散点图
  • 用户消费金额的分布图
  • 用户消费次数的分布图
  • 用户累计消费金额占比(百分之多少的用户占了百分之多少的消费额)
group_user = date.groupby('user_id')
group_user.sum().describe()
order_productsorder_amount
count23570.00000023570.000000
mean7.122656106.080426
std16.983531240.925195
min1.0000000.000000
25%1.00000019.970000
50%3.00000043.395000
75%7.000000106.475000
max1033.00000013990.930000
从上图统计得出,用户平均消费次数和消费金额波动性很大,数据呈现右偏趋势,即大部分用户的消费次数和消费金额都不是很高
user_data = group_user.sum()
user_data.head()
order_productsorder_amount
user_id
1111.77
2689.00
316156.46
47100.50
529385.61
user_data.plot.scatter(x='order_products', y='order_amount')
<matplotlib.axes._subplots.AxesSubplot at 0xaadf780>

在这里插入图片描述

user_data.query('order_amount < 4000').plot.scatter(x='order_products', y='order_amount')
<matplotlib.axes._subplots.AxesSubplot at 0xa846898>

在这里插入图片描述

绘制直方图(用户消费金额直方图)

user_data['order_amount'].plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0xc7c92e8>

在这里插入图片描述

从图中可以看出绝大部分用户比较集中,只有极少异常值,可以进行过滤操作
user_data.query('order_products < 150')['order_amount'].plot.hist(bins=40)
<matplotlib.axes._subplots.AxesSubplot at 0xc8abb00>

在这里插入图片描述

使用切比雪夫定理过滤掉异常值,计算96%的数据分布情况,order_products标准差为16.983531
user_data.query('(order_products < 16.983531*5) & (order_products > -16.983531*5)')['order_amount'].plot.hist(bins=40)
<matplotlib.axes._subplots.AxesSubplot at 0xc8db6a0>

在这里插入图片描述

用户累计消费金额占比(百分之多少的用户占了百分之多少的消费额,需要排序)

user_cumsum = user_data.sort_values('order_amount')[['order_amount']].cumsum()
# user_data[['order_amount']].sum()
user_cumsum = user_cumsum.apply(lambda x:x/user_data['order_amount'].sum())
user_cumsum.head()
order_amount
user_id
101750.0
45590.0
19480.0
9250.0
107980.0
user_cumsum.reset_index()['order_amount'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0xab57978>
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200712152347221.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDE0MTAwMA==,size_16,color_FFFFFF,t_70)
由图表可以看出,50%的用户仅仅贡献了15%的消费金额,而排名前20000的用户贡献了近40%的消费额。

3.用户消费行为

  • 用户第一次消费时间分布(首购)
  • 用户最后一次消费
  • 新老客消费比
    • 多少用户仅消费了一次?
    • 每月新客占比?
  • 用户分层
    • RFM
    • 新、活跃、回流、流失/不活跃
  • 用户购买周期(按订单)
    • 用户消费周期描述
    • 用户消费周期分布
  • 用户生命周期(按第一次和最后一次消费时间)
    • 用户生命周期描述
    • 用户生命周期分布
group_user['order_dt'].min().value_counts().plot()
<matplotlib.axes._subplots.AxesSubplot at 0xcabc7f0>

在这里插入图片描述

用户首购集中在前三个月,并且可以看出在第二月间有明显的波动
group_user['order_dt'].max().value_counts().plot()
<matplotlib.axes._subplots.AxesSubplot at 0xca506a0>

在这里插入图片描述

从图中可以看出,大部分用户在进行了第一次消费后就不再消费了,并且三月后,可以看出用户消费呈现不断流失的状态

多少用户仅消费了一次?

shop_time = group_user['order_dt'].agg(['min', 'max'])
shop_time.head()
minmax
user_id
11997-01-011997-01-01
21997-01-121997-01-12
31997-01-021998-05-28
41997-01-011997-12-12
51997-01-011998-01-03
count = shop_time[shop_time['min']==shop_time['max']]
count.count()
min    12054
max    12054
dtype: int64
# 每月新客占比
month_user_sum = date.groupby('month')[['order_dt']].count()
month_user_sum = month_user_sum.cumsum()
user_class = date.groupby('user_id')
user_min = user_class.month.agg('min').reset_index()
user_min = user_min.groupby('month').count()
user_min
user_id
month
1997-01-017846
1997-02-018476
1997-03-017248
new_user = month_user_sum.join(user_min).fillna(0)
new_user
order_dtuser_id
month
1997-01-0189287846.0
1997-02-01202008476.0
1997-03-01317987248.0
1997-04-01355790.0
1997-05-01384740.0
1997-06-01415280.0
1997-07-01444700.0
1997-08-01467900.0
1997-09-01490860.0
1997-10-01516480.0
1997-11-01543980.0
1997-12-01569020.0
1998-01-01589340.0
1998-02-01609600.0
1998-03-01637530.0
1998-04-01656310.0
1998-05-01676160.0
1998-06-01696590.0
def func_handel(x):
    return x.user_id / x.order_dt
new_user['proportion'] = new_user.apply(func_handel, axis=1).values
new_user
order_dtuser_idproportion
month
1997-01-0189287846.00.878808
1997-02-01202008476.00.419604
1997-03-01317987248.00.227939
1997-04-01355790.00.000000
1997-05-01384740.00.000000
1997-06-01415280.00.000000
1997-07-01444700.00.000000
1997-08-01467900.00.000000
1997-09-01490860.00.000000
1997-10-01516480.00.000000
1997-11-01543980.00.000000
1997-12-01569020.00.000000
1998-01-01589340.00.000000
1998-02-01609600.00.000000
1998-03-01637530.00.000000
1998-04-01656310.00.000000
1998-05-01676160.00.000000
1998-06-01696590.00.000000
将每月新客占比进行图的绘制
plt.plot(new_user.index, new_user.proportion)
plt.xticks(rotation=45)
plt.show()

在这里插入图片描述

从图中可以看出新客用户都集中在前三个月,并且每月新客占比逐月减少,到第三月后再无新增客户
rfm = date.pivot_table(index='user_id', values=['order_products','order_amount','order_dt'],
                 aggfunc={'order_dt':'max','order_products':'sum','order_amount':'sum'})
rfm.head()
order_amountorder_dtorder_products
user_id
111.771997-01-011
289.001997-01-126
3156.461998-05-2816
4100.501997-12-127
5385.611998-01-0329
# -(rfm.order_dt - rfm.order_dt.max()) / np.timedelta64(1, 'D')    # 将其转化为数字,而不是带着 days 的单位
rfm['R'] = -(rfm.order_dt - rfm.order_dt.max()) / np.timedelta64(1, 'D') 
rfm.rename(columns={'order_products':'F','order_amount':'M'}, inplace=True)
rfm.head()
Morder_dtFR
user_id
111.771997-01-011545.0
289.001997-01-126534.0
3156.461998-05-281633.0
4100.501997-12-127200.0
5385.611998-01-0329178.0
def rfm_func(x):
    level = x.apply(lambda x:'1' if x>=0 else '0')
    label = level.R + level.F + level.M
    d = {
        '111':'重要客户',
        '011':'重要保持客户',
        '101':'重要挽留客户',
        '001':'重要发展客户',
        '110':'一般客户',
        '010':'一般保持客户',
        '100':'一般挽留客户',
        '000':'一般发展客户'
    }
    result = d[label]
    return result
rfm['label'] = rfm[['R','F','M']].apply(lambda x:x-x.mean()).apply(rfm_func, axis=1)
rfm.head()
Morder_dtFRlabel
user_id
111.771997-01-011545.0一般挽留客户
289.001997-01-126534.0一般挽留客户
3156.461998-05-281633.0重要保持客户
4100.501997-12-127200.0一般发展客户
5385.611998-01-0329178.0重要保持客户
rfm.groupby('label').sum()
MFR
label
一般保持客户19937.45171229448.0
一般发展客户196971.2313977591108.0
一般客户7181.2865036295.0
一般挽留客户438291.81293466951815.0
重要保持客户1592039.62107789517267.0
重要发展客户45785.01202356636.0
重要客户167080.8311121358363.0
重要挽留客户33028.401263114482.0
# 为相关的分组添加上标签
rfm.loc[rfm.label == '重要客户', 'color'] = 'g'
rfm.loc[~(rfm.label == '重要客户'), 'color'] = 'r'
rfm.plot.scatter('F','R',c=rfm.color)
<matplotlib.axes._subplots.AxesSubplot at 0xca0f908>

在这里插入图片描述

由RFM分层可得,大部分银行为重要保持客户,但这是由于极值的影响,所以RFM的划分标准应该以业务为准

  • 尽量用小部分的用户涵盖大部分的额度
  • 不要为了数据好看划分等级
# 新、活跃、回流、流失/不活跃
pivot_counts = date.pivot_table(index='user_id', columns='month', values='order_dt', aggfunc='count').fillna(0)
pivot_counts.head()
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
user_id
11.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
22.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
31.00.01.01.00.00.00.00.00.00.02.00.00.00.00.00.01.00.0
42.00.00.00.00.00.00.01.00.00.00.01.00.00.00.00.00.00.0
52.01.00.01.01.01.01.00.01.00.00.02.01.00.00.00.00.00.0
purchase_user = pivot_counts.applymap(lambda x:1 if x>0 else 0)
purchase_user.tail()
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
user_id
23566001000000000000000
23567001000000000000000
23568001100000000000000
23569001000000000000000
23570001000000000000000
def active_status(data):
    status = []
    for i in range(18):
        # 本月没有消费
        if data[i] == 0:
            if len(status) == 0:
                status.append('unregister')
            else:
                if status[i-1] == 'unregister':
                    status.append('unregister')
                else:
                    status.append('unactive')
        # 本月消费
        else:
            if len(status) == 0:
                status.append('new')
            else:
                if status[i-1] == 'unregister':
                    status.append('new')
                elif status[i-1] == 'unactive':
                    status.append('return')
                else:
                    status.append('active')
    return status
purchase_user = purchase_user.apply(active_status, axis=1)
purchase_user.tail()
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
user_id
23566unregisterunregisternewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23567unregisterunregisternewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23568unregisterunregisternewactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23569unregisterunregisternewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23570unregisterunregisternewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
purchase_user_status = purchase_user.replace('unregister', np.NaN).apply(lambda x:pd.value_counts(x))
purchase_user_status
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
activeNaN1157.016811773.0852.0747.0746.0604.0528.0532.0624.0632.0512.0472.0571.0518.0459.0446.0
new7846.08476.07248NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
returnNaNNaN5951049.01362.01592.01434.01168.01211.01307.01404.01232.01025.01079.01489.0919.01029.01060.0
unactiveNaN6689.01404620748.021356.021231.021390.021798.021831.021731.021542.021706.022033.022019.021510.022133.022082.022064.0
purchase_user_status.fillna(0).T
activenewreturnunactive
month
1997-01-010.07846.00.00.0
1997-02-011157.08476.00.06689.0
1997-03-011681.07248.0595.014046.0
1997-04-011773.00.01049.020748.0
1997-05-01852.00.01362.021356.0
1997-06-01747.00.01592.021231.0
1997-07-01746.00.01434.021390.0
1997-08-01604.00.01168.021798.0
1997-09-01528.00.01211.021831.0
1997-10-01532.00.01307.021731.0
1997-11-01624.00.01404.021542.0
1997-12-01632.00.01232.021706.0
1998-01-01512.00.01025.022033.0
1998-02-01472.00.01079.022019.0
1998-03-01571.00.01489.021510.0
1998-04-01518.00.0919.022133.0
1998-05-01459.00.01029.022082.0
1998-06-01446.00.01060.022064.0
purchase_user_status.fillna(0).T.plot.area()
<matplotlib.axes._subplots.AxesSubplot at 0xc9fc0b8>

在这里插入图片描述

purchase_user_status.fillna(0).T.apply(lambda x:x/x.sum(), axis=1)
activenewreturnunactive
month
1997-01-010.0000001.0000000.0000000.000000
1997-02-010.0708860.5192990.0000000.409815
1997-03-010.0713190.3075100.0252440.595927
1997-04-010.0752230.0000000.0445060.880272
1997-05-010.0361480.0000000.0577850.906067
1997-06-010.0316930.0000000.0675430.900764
1997-07-010.0316500.0000000.0608400.907510
1997-08-010.0256260.0000000.0495550.924820
1997-09-010.0224010.0000000.0513790.926220
1997-10-010.0225710.0000000.0554520.921977
1997-11-010.0264740.0000000.0595670.913958
1997-12-010.0268140.0000000.0522700.920916
1998-01-010.0217230.0000000.0434870.934790
1998-02-010.0200250.0000000.0457790.934196
1998-03-010.0242260.0000000.0631740.912601
1998-04-010.0219770.0000000.0389900.939033
1998-05-010.0194740.0000000.0436570.936869
1998-06-010.0189220.0000000.0449720.936105

由上表可知,每月用户消费状态变化

  • 回流用户:之前没消费,本月才消费(唤回运营)
  • 活跃用户:持续消费(消费运营的质量)
  • 不活跃用户:流失用户
# 用户购买周期(按订单)    shift()函数是使得数据矩阵偏移一个单位
order_diff = group_user.apply(lambda x:(x.order_dt - x.order_dt.shift()) / np.timedelta64(1, 'D'))
order_diff.hist(bins=20)
<matplotlib.axes._subplots.AxesSubplot at 0x1299db00>

在这里插入图片描述

# 用户生命周期(按第一次和最后一次消费时间)
user_period = group_user.order_dt.agg(['max', 'min'])
user_period = user_period.apply(lambda x:(x['max'] - x['min']) / np.timedelta64(1, 'D'), axis=1)
user_period.describe()
count    23570.000000
mean       134.871956
std        180.574109
min          0.000000
25%          0.000000
50%          0.000000
75%        294.000000
max        544.000000
dtype: float64
user_period = pd.DataFrame(user_period, columns=['diff'])
user_period.head()
diff
user_id
10.0
20.0
3511.0
4345.0
5367.0
user_period[user_period['diff']!=0].hist(bins=40)   # 将生命周期为0的全部剔除掉
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000000B3A2DD8>]], dtype=object)

在这里插入图片描述

复购率和回购率分析

  • 复购率
    • 自然月内,购买多次的用户占比
  • 回购率
    • 曾经购买过的用户在某一段时间内的再次购买的占比
pivot_counts.head(10)
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
user_id
11.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
22.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
31.00.01.01.00.00.00.00.00.00.02.00.00.00.00.00.01.00.0
42.00.00.00.00.00.00.01.00.00.00.01.00.00.00.00.00.00.0
52.01.00.01.01.01.01.00.01.00.00.02.01.00.00.00.00.00.0
61.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
71.00.00.00.00.00.00.00.00.01.00.00.00.00.01.00.00.00.0
81.01.00.00.00.01.01.00.00.00.02.01.00.00.01.00.00.00.0
91.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.01.0
101.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
re_purchase = pivot_counts.applymap(lambda x:1 if x>1 else np.NaN if x==0 else 0)
re_purchase.head(10)
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
user_id
10.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
21.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
30.0NaN0.00.0NaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaN0.0NaN
41.0NaNNaNNaNNaNNaNNaN0.0NaNNaNNaN0.0NaNNaNNaNNaNNaNNaN
51.00.0NaN0.00.00.00.0NaN0.0NaNNaN1.00.0NaNNaNNaNNaNNaN
60.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
70.0NaNNaNNaNNaNNaNNaNNaNNaN0.0NaNNaNNaNNaN0.0NaNNaNNaN
80.00.0NaNNaNNaN0.00.0NaNNaNNaN1.00.0NaNNaN0.0NaNNaNNaN
90.0NaNNaNNaN0.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.0
100.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
re_purchase.apply(lambda x:x.sum()/x.count()).plot(figsize=(10, 4))
<matplotlib.axes._subplots.AxesSubplot at 0x28a1390>

在这里插入图片描述

复购率在前三月增长的较快,可能是新增用户基数较大,但大部分都是只购买了一次,导致复购率较低,随后一直稳定在20%左右
# 回购率是指本月消费了的用户中,在下一月继续消费的用户有多少
repurchase = pivot_counts.applymap(lambda x:1 if x>0 else  0)
repurchase.head(10)
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
user_id
1100000000000000000
2100000000000000000
3101100000010000010
4100000010001000000
5110111101001100000
6100000000000000000
7100000000100001000
8110001100011001000
9100010000000000001
10100000000000000000
def Repurchase(data):
    is_purchase = []
    for i in range(0, 17):
        # 本月消费了
        if data[i] > 0:
            # 下月消费了
            if data[i+1] > 0:
                is_purchase.append(1)
            # 下月没有消费
            else:
                is_purchase.append(0)
        # 本月没有消费
        else:
            # 不管下月消费与否
            is_purchase.append(np.NaN)
            
    is_purchase.append(np.NaN)
    return is_purchase
repurchase = repurchase.apply(Repurchase, axis=1)
repurchase.head(5)
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
user_id
10.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
20.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
30.0NaN1.00.0NaNNaNNaNNaNNaNNaN0.0NaNNaNNaNNaNNaN0.0NaN
40.0NaNNaNNaNNaNNaNNaN0.0NaNNaNNaN0.0NaNNaNNaNNaNNaNNaN
51.00.0NaN1.01.01.00.0NaN0.0NaNNaN1.00.0NaNNaNNaNNaNNaN
repurchase.apply(lambda x:x.sum()/x.count()).plot(figsize=(10,4))
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in true_divide
  """Entry point for launching an IPython kernel.





<matplotlib.axes._subplots.AxesSubplot at 0xaad40b8>

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值