七周成为数据分析师(案例python脚本)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("CDNOW_MASTER.CSV")
df.head()
Out[1]:
 IDDATENUMTOTAL
0119970101111.77
1219970112112.00
2219970112577.00
3319970102220.76
4319970330220.76
  • ID:购买用户id
  • DATE:用户购买日期
  • NUM:用户购买数量
  • TOTAL:用户购买总额
In [2]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69659 entries, 0 to 69658
Data columns (total 4 columns):
ID       69659 non-null int64
DATE     69659 non-null int64
NUM      69659 non-null int64
TOTAL    69659 non-null float64
dtypes: float64(1), int64(3)
memory usage: 2.1 MB
In [3]:
df.describe()
Out[3]:
 IDDATENUMTOTAL
count69659.0000006.965900e+0469659.00000069659.000000
mean11470.8545921.997228e+072.41004035.893648
std6819.9048483.837735e+032.33392436.281942
min1.0000001.997010e+071.0000000.000000
25%5506.0000001.997022e+071.00000014.490000
50%11410.0000001.997042e+072.00000025.980000
75%17273.0000001.997111e+073.00000043.700000
max23570.0000001.998063e+0799.0000001286.010000
  • 大部分订单只有1-3的销量,平均值为2.4,且存在一定极值干扰
  • 用户消费金额比较稳定,平均数是36,中位数是25,存在一定极值干扰
In [4]:
df['DATE']=pd.to_datetime(df['DATE'],format='%Y%m%d')
df['month']=df.DATE.values.astype('datetime64[M]')
df
Out[4]:
 IDDATENUMTOTALmonth
011997-01-01111.771997-01-01
121997-01-12112.001997-01-01
221997-01-12577.001997-01-01
331997-01-02220.761997-01-01
431997-03-30220.761997-03-01
531997-04-02219.541997-04-01
631997-11-15557.451997-11-01
731997-11-25420.961997-11-01
831998-05-28116.991998-05-01
941997-01-01229.331997-01-01
1041997-01-18229.731997-01-01
1141997-08-02114.961997-08-01
1241997-12-12226.481997-12-01
1351997-01-01229.331997-01-01
1451997-01-14113.971997-01-01
1551997-02-04338.901997-02-01
1651997-04-11345.551997-04-01
1751997-05-31338.711997-05-01
1851997-06-16226.141997-06-01
1951997-07-22228.141997-07-01
2051997-09-15340.471997-09-01
2151997-12-08446.461997-12-01
2251997-12-12340.471997-12-01
2351998-01-03337.471998-01-01
2461997-01-01120.991997-01-01
2571997-01-01228.741997-01-01
2671997-10-11797.431997-10-01
2771998-03-229138.501998-03-01
2881997-01-0119.771997-01-01
2981997-02-13113.971997-02-01
..................
69629235561997-09-27331.471997-09-01
69630235561998-01-03228.981998-01-01
69631235561998-06-07228.981998-06-01
69632235571997-03-25114.371997-03-01
69633235581997-03-25228.131997-03-01
69634235581997-05-18345.511997-05-01
69635235581997-06-24223.741997-06-01
69636235581998-02-25448.221998-02-01
69637235591997-03-25223.541997-03-01
69638235591997-05-18335.311997-05-01
69639235591997-06-27352.801997-06-01
69640235601997-03-25118.361997-03-01
69641235611997-03-25230.921997-03-01
69642235611998-01-28115.491998-01-01
69643235611998-05-29337.051998-05-01
69644235621997-03-25229.331997-03-01
69645235631997-03-25110.771997-03-01
69646235631997-10-04247.981997-10-01
69647235641997-03-25111.771997-03-01
69648235641997-05-21111.771997-05-01
69649235641997-11-30346.471997-11-01
69650235651997-03-25111.771997-03-01
69651235661997-03-25236.001997-03-01
69652235671997-03-25120.971997-03-01
69653235681997-03-25122.971997-03-01
69654235681997-04-05483.741997-04-01
69655235681997-04-22114.991997-04-01
69656235691997-03-25225.741997-03-01
69657235701997-03-25351.121997-03-01
69658235701997-03-26242.961997-03-01

69659 rows × 5 columns

1.进行用户消费趋势分析(按月)

  • 每月的消费总金额
  • 每月的消费次数
  • 每月的产品购买量
  • 每月的消费人数
In [5]:
grouped_month=df.groupby(by='month')
order_month_amount=grouped_month.TOTAL.sum()
order_month_amount.head()
Out[5]:
month
1997-01-01    299060.17
1997-02-01    379590.03
1997-03-01    393155.27
1997-04-01    142824.49
1997-05-01    107933.30
Name: TOTAL, dtype: float64
In [6]:
#可视化显示在页面
%matplotlib inline
#更改设计风格
plt.style.use('ggplot')
order_month_amount.plot()
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a8a8208>

由上图可知,消费金额在前三个月达到最高峰,后续消费较为稳定,有轻微下降趋势

In [7]:
grouped_month.ID.count().plot()
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a449860>

由上图可知,消费次数在前三个月达到最高峰,后续消费次数较为稳定,有轻微下降趋势

In [8]:
grouped_month.NUM.sum().plot()
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a4ec5c0>

由上图可知,每月购买产品量在前三个月达到最高峰,后续购买量较为稳定,有轻微下降趋势

In [9]:
grouped_month.ID.apply(lambda x :len(set(x))).plot()
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a906438>

由上图可知,每月消费人数在前三个月达到最高峰,每月消费人数较为稳定,有轻微下降趋势

2.用户个体消费分析

  • 用户消费金额、消费次数的描述统计
  • 用户消费金额和消费次数的散点图
  • 用户消费金额的分布图
  • 用户消费次数的分布图
  • 用户累计消费金额占比
In [10]:
grouped_user=df.groupby('ID')
In [11]:
grouped_user.sum().describe()
Out[11]:
 NUMTOTAL
count23570.00000023570.000000
mean7.122656106.080426
std16.983531240.925195
min1.0000000.000000
25%1.00000019.970000
50%3.00000043.395000
75%7.000000106.475000
max1033.00000013990.930000
  • 用户平均购买了7张CD,但中位值只有3,说明小部分用户购买大多数的CD
  • 用户平均消费106元,中位值有43,判断同上,有极值干扰
In [12]:
grouped_user.sum().query('TOTAL<4000').plot.scatter(x='TOTAL',y='NUM')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6abad7b8>

用户消费总金额集中在1500元以下,购买总CD数量集中在100张以下

In [13]:
grouped_user.sum().query('NUM<100').TOTAL.plot.hist(bins=20)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6abcc128>

大多数用户消费金额在250元以下

In [14]:
grouped_user.sum().query('NUM<100').NUM.plot.hist(bins=20)
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bbd75c0>

大多数用户购买CD数量在20张以下

In [15]:
user_cumsum=grouped_user.sum().sort_values('TOTAL').apply(lambda x :x.cumsum()/x.sum())
user_cumsum.reset_index().TOTAL.plot()
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bd3d198>

按照用户消费金额进行升序排列,50%的用户仅贡献了15%的消费额度,消费总额排名前5000的贡献了60%的消费额度

3.用户消费行为

  • 用户第一次消费(首购)
  • 用户最后一次消费
  • 新老客消费比
    • 多数用户仅消费一次?
    • 每月新老客占比?
  • 用户分层
    • RFM
    • 新、老、活跃、回流、流失
  • 用户购买周期(按订单)
    • 用户消费周期描述
    • 用户消费周期分布
  • 用户生命周期(按照第一次&最后一次消费)
    • 用户生命周期描述
    • 用户生命周期分布
In [16]:
grouped_user.min().DATE.value_counts().plot()
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bd0c128>

用户第一次购买分布,集中在前三个月。并且在2-11至2-25有一次剧烈波动,具体再分析

In [17]:
grouped_user.max().DATE.value_counts().plot()
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bddde48>

用户最后一次购买比第一次购买分布广,大部分最后一次购买集中在前三个月,说明很多用户第一次购买后不再进行购买,随着时间,最后一次购买数在增加,消费呈流失上升的状况。

In [18]:
user_life=grouped_user.DATE.agg(['min','max']).reset_index()
user_life.head()
Out[18]:
 IDminmax
011997-01-011997-01-01
121997-01-121997-01-12
231997-01-021998-05-28
341997-01-011997-12-12
451997-01-011998-01-03
In [19]:
(user_life['min']==user_life['max']).value_counts()
Out[19]:
True     12054
False    11516
dtype: int64

有一半用户, 就 消费了一次

In [20]:
user_life['regDate']=user_life['min'].values.astype('datetime64[M]')
user_life
Out[20]:
 IDminmaxregDate
011997-01-011997-01-011997-01-01
121997-01-121997-01-121997-01-01
231997-01-021998-05-281997-01-01
341997-01-011997-12-121997-01-01
451997-01-011998-01-031997-01-01
561997-01-011997-01-011997-01-01
671997-01-011998-03-221997-01-01
781997-01-011998-03-291997-01-01
891997-01-011998-06-081997-01-01
9101997-01-211997-01-211997-01-01
10111997-01-011998-02-201997-01-01
11121997-01-011997-01-011997-01-01
12131997-01-011997-01-011997-01-01
13141997-01-011997-01-011997-01-01
14151997-01-011997-01-011997-01-01
15161997-01-011997-09-101997-01-01
16171997-01-011997-01-011997-01-01
17181997-01-041997-01-041997-01-01
18191997-01-011997-06-101997-01-01
19201997-01-011997-01-181997-01-01
20211997-01-011997-01-131997-01-01
21221997-01-011997-01-011997-01-01
22231997-01-011997-01-011997-01-01
23241997-01-011998-01-201997-01-01
24251997-01-011998-06-081997-01-01
25261997-01-011997-01-261997-01-01
26271997-01-071997-01-121997-01-01
27281997-01-011997-03-081997-01-01
28291997-01-011998-04-261997-01-01
29301997-01-011997-02-141997-01-01
...............
23540235411997-03-251997-04-021997-03-01
23541235421997-03-251997-03-251997-03-01
23542235431997-03-251997-03-251997-03-01
23543235441997-03-251998-01-241997-03-01
23544235451997-03-251997-03-251997-03-01
23545235461997-03-251997-03-251997-03-01
23546235471997-03-251997-04-071997-03-01
23547235481997-03-251997-03-251997-03-01
23548235491997-03-251997-03-251997-03-01
23549235501997-03-251997-03-251997-03-01
23550235511997-03-251997-09-111997-03-01
23551235521997-03-251997-04-031997-03-01
23552235531997-03-251997-03-281997-03-01
23553235541997-03-251998-02-011997-03-01
23554235551997-03-251998-06-101997-03-01
23555235561997-03-251998-06-071997-03-01
23556235571997-03-251997-03-251997-03-01
23557235581997-03-251998-02-251997-03-01
23558235591997-03-251997-06-271997-03-01
23559235601997-03-251997-03-251997-03-01
23560235611997-03-251998-05-291997-03-01
23561235621997-03-251997-03-251997-03-01
23562235631997-03-251997-10-041997-03-01
23563235641997-03-251997-11-301997-03-01
23564235651997-03-251997-03-251997-03-01
23565235661997-03-251997-03-251997-03-01
23566235671997-03-251997-03-251997-03-01
23567235681997-03-251997-04-221997-03-01
23568235691997-03-251997-03-251997-03-01
23569235701997-03-251997-03-261997-03-01

23570 rows × 4 columns

In [21]:
user_life_month=df.groupby(['month','ID']).count().reset_index()
In [22]:
user_life_month=user_life_month.merge(user_life,on='ID')
In [23]:
user_life_month
Out[23]:
 monthIDDATENUMTOTALminmaxregDate
01997-01-0111111997-01-011997-01-011997-01-01
11997-01-0122221997-01-121997-01-121997-01-01
21997-01-0131111997-01-021998-05-281997-01-01
31997-03-0131111997-01-021998-05-281997-01-01
41997-04-0131111997-01-021998-05-281997-01-01
51997-11-0132221997-01-021998-05-281997-01-01
61998-05-0131111997-01-021998-05-281997-01-01
71997-01-0142221997-01-011997-12-121997-01-01
81997-08-0141111997-01-011997-12-121997-01-01
91997-12-0141111997-01-011997-12-121997-01-01
101997-01-0152221997-01-011998-01-031997-01-01
111997-02-0151111997-01-011998-01-031997-01-01
121997-04-0151111997-01-011998-01-031997-01-01
131997-05-0151111997-01-011998-01-031997-01-01
141997-06-0151111997-01-011998-01-031997-01-01
151997-07-0151111997-01-011998-01-031997-01-01
161997-09-0151111997-01-011998-01-031997-01-01
171997-12-0152221997-01-011998-01-031997-01-01
181998-01-0151111997-01-011998-01-031997-01-01
191997-01-0161111997-01-011997-01-011997-01-01
201997-01-0171111997-01-011998-03-221997-01-01
211997-10-0171111997-01-011998-03-221997-01-01
221998-03-0171111997-01-011998-03-221997-01-01
231997-01-0181111997-01-011998-03-291997-01-01
241997-02-0181111997-01-011998-03-291997-01-01
251997-06-0181111997-01-011998-03-291997-01-01
261997-07-0181111997-01-011998-03-291997-01-01
271997-11-0182221997-01-011998-03-291997-01-01
281997-12-0181111997-01-011998-03-291997-01-01
291998-03-0181111997-01-011998-03-291997-01-01
...........................
553491997-06-01235561111997-03-251998-06-071997-03-01
553501997-07-01235562221997-03-251998-06-071997-03-01
553511997-09-01235561111997-03-251998-06-071997-03-01
553521998-01-01235561111997-03-251998-06-071997-03-01
553531998-06-01235561111997-03-251998-06-071997-03-01
553541997-03-01235571111997-03-251997-03-251997-03-01
553551997-03-01235581111997-03-251998-02-251997-03-01
553561997-05-01235581111997-03-251998-02-251997-03-01
553571997-06-01235581111997-03-251998-02-251997-03-01
553581998-02-01235581111997-03-251998-02-251997-03-01
553591997-03-01235591111997-03-251997-06-271997-03-01
553601997-05-01235591111997-03-251997-06-271997-03-01
553611997-06-01235591111997-03-251997-06-271997-03-01
553621997-03-01235601111997-03-251997-03-251997-03-01
553631997-03-01235611111997-03-251998-05-291997-03-01
553641998-01-01235611111997-03-251998-05-291997-03-01
553651998-05-01235611111997-03-251998-05-291997-03-01
553661997-03-01235621111997-03-251997-03-251997-03-01
553671997-03-01235631111997-03-251997-10-041997-03-01
553681997-10-01235631111997-03-251997-10-041997-03-01
553691997-03-01235641111997-03-251997-11-301997-03-01
553701997-05-01235641111997-03-251997-11-301997-03-01
553711997-11-01235641111997-03-251997-11-301997-03-01
553721997-03-01235651111997-03-251997-03-251997-03-01
553731997-03-01235661111997-03-251997-03-251997-03-01
553741997-03-01235671111997-03-251997-03-251997-03-01
553751997-03-01235681111997-03-251997-04-221997-03-01
553761997-04-01235682221997-03-251997-04-221997-03-01
553771997-03-01235691111997-03-251997-03-251997-03-01
553781997-03-01235702221997-03-251997-03-261997-03-01

55379 rows × 8 columns

In [24]:
user_life_month['isNew']=user_life_month.apply(lambda x:True if x.month==x.regDate else np.NaN,axis=1)
In [25]:
user_life_month_count=user_life_month.groupby('month').count().reset_index()
In [26]:
user_life_month_count
Out[26]:
 monthIDDATENUMTOTALminmaxregDateisNew
01997-01-0178467846784678467846784678467846
11997-02-0196339633963396339633963396338476
21997-03-0195249524952495249524952495247248
31997-04-0128222822282228222822282228220
41997-05-0122142214221422142214221422140
51997-06-0123392339233923392339233923390
61997-07-0121802180218021802180218021800
71997-08-0117721772177217721772177217720
81997-09-0117391739173917391739173917390
91997-10-0118391839183918391839183918390
101997-11-0120282028202820282028202820280
111997-12-0118641864186418641864186418640
121998-01-0115371537153715371537153715370
131998-02-0115511551155115511551155115510
141998-03-0120602060206020602060206020600
151998-04-0114371437143714371437143714370
161998-05-0114881488148814881488148814880
171998-06-0115061506150615061506150615060
In [27]:
user_life_month_count['isOld']=pd.DataFrame(user_life_month_count['ID']-user_life_month_count['isNew'])
user_life_month_count.drop(columns=['DATE','NUM','TOTAL','min','max','regDate'],inplace=True)
In [28]:
user_life_month_count.rename(columns={'ID':'all'},inplace=True)
In [29]:
user_life_month_count
Out[29]:
 monthallisNewisOld
01997-01-01784678460
11997-02-01963384761157
21997-03-01952472482276
31997-04-01282202822
41997-05-01221402214
51997-06-01233902339
61997-07-01218002180
71997-08-01177201772
81997-09-01173901739
91997-10-01183901839
101997-11-01202802028
111997-12-01186401864
121998-01-01153701537
131998-02-01155101551
141998-03-01206002060
151998-04-01143701437
161998-05-01148801488
171998-06-01150601506

新用户购买集中在前三个月,三个月后无新用户购买,均为老用户购买

In [30]:
rfm=df.pivot_table(index='ID',values=['NUM','TOTAL','DATE'],aggfunc={'NUM':'sum','TOTAL':'sum','DATE':'max'})
rfm
Out[30]:
 DATENUMTOTAL
ID   
11997-01-01111.77
21997-01-12689.00
31998-05-2816156.46
41997-12-127100.50
51998-01-0329385.61
61997-01-01120.99
71998-03-2218264.67
81998-03-2918197.66
91998-06-08695.85
101997-01-21339.31
111998-02-20458.55
121997-01-01457.06
131997-01-01472.94
141997-01-01229.92
151997-01-01452.87
161997-09-10879.87
171997-01-01573.22
181997-01-04114.96
191997-06-1011175.12
201997-01-1846653.01
211997-01-13475.11
221997-01-01114.37
231997-01-01224.74
241998-01-20457.77
251998-06-0812137.53
261997-01-266102.69
271997-01-1210135.87
281997-03-08790.99
291998-04-2628435.81
301997-02-14228.34
............
235411997-04-02257.34
235421997-03-25577.43
235431997-03-25250.76
235441998-01-2412134.63
235451997-03-25124.99
235461997-03-25113.97
235471997-04-07223.54
235481997-03-25223.54
235491997-03-25227.13
235501997-03-25225.28
235511997-09-1112264.63
235521997-04-03449.38
235531997-03-28898.58
235541998-02-01336.37
235551998-06-1014189.18
235561998-06-0715203.00
235571997-03-25114.37
235581998-02-2511145.60
235591997-06-278111.65
235601997-03-25118.36
235611998-05-29683.46
235621997-03-25229.33
235631997-10-04358.75
235641997-11-30570.01
235651997-03-25111.77
235661997-03-25236.00
235671997-03-25120.97
235681997-04-226121.70
235691997-03-25225.74
235701997-03-26594.08

23570 rows × 3 columns

In [31]:
rfm['R']=-(rfm['DATE']-rfm['DATE'].max())/np.timedelta64(1,'D')
rfm
Out[31]:
 DATENUMTOTALR
ID    
11997-01-01111.77545.0
21997-01-12689.00534.0
31998-05-2816156.4633.0
41997-12-127100.50200.0
51998-01-0329385.61178.0
61997-01-01120.99545.0
71998-03-2218264.67100.0
81998-03-2918197.6693.0
91998-06-08695.8522.0
101997-01-21339.31525.0
111998-02-20458.55130.0
121997-01-01457.06545.0
131997-01-01472.94545.0
141997-01-01229.92545.0
151997-01-01452.87545.0
161997-09-10879.87293.0
171997-01-01573.22545.0
181997-01-04114.96542.0
191997-06-1011175.12385.0
201997-01-1846653.01528.0
211997-01-13475.11533.0
221997-01-01114.37545.0
231997-01-01224.74545.0
241998-01-20457.77161.0
251998-06-0812137.5322.0
261997-01-266102.69520.0
271997-01-1210135.87534.0
281997-03-08790.99479.0
291998-04-2628435.8165.0
301997-02-14228.34501.0
...............
235411997-04-02257.34454.0
235421997-03-25577.43462.0
235431997-03-25250.76462.0
235441998-01-2412134.63157.0
235451997-03-25124.99462.0
235461997-03-25113.97462.0
235471997-04-07223.54449.0
235481997-03-25223.54462.0
235491997-03-25227.13462.0
235501997-03-25225.28462.0
235511997-09-1112264.63292.0
235521997-04-03449.38453.0
235531997-03-28898.58459.0
235541998-02-01336.37149.0
235551998-06-1014189.1820.0
235561998-06-0715203.0023.0
235571997-03-25114.37462.0
235581998-02-2511145.60125.0
235591997-06-278111.65368.0
235601997-03-25118.36462.0
235611998-05-29683.4632.0
235621997-03-25229.33462.0
235631997-10-04358.75269.0
235641997-11-30570.01212.0
235651997-03-25111.77462.0
235661997-03-25236.00462.0
235671997-03-25120.97462.0
235681997-04-226121.70434.0
235691997-03-25225.74462.0
235701997-03-26594.08461.0

23570 rows × 4 columns

In [32]:
rfm.rename(columns={'NUM':'F','TOTAL':'M'},inplace=True)
In [33]:
rfm.drop(columns=['DATE'],inplace=True)
rfm
Out[33]:
 FMR
ID   
1111.77545.0
2689.00534.0
316156.4633.0
47100.50200.0
529385.61178.0
6120.99545.0
718264.67100.0
818197.6693.0
9695.8522.0
10339.31525.0
11458.55130.0
12457.06545.0
13472.94545.0
14229.92545.0
15452.87545.0
16879.87293.0
17573.22545.0
18114.96542.0
1911175.12385.0
2046653.01528.0
21475.11533.0
22114.37545.0
23224.74545.0
24457.77161.0
2512137.5322.0
266102.69520.0
2710135.87534.0
28790.99479.0
2928435.8165.0
30228.34501.0
............
23541257.34454.0
23542577.43462.0
23543250.76462.0
2354412134.63157.0
23545124.99462.0
23546113.97462.0
23547223.54449.0
23548223.54462.0
23549227.13462.0
23550225.28462.0
2355112264.63292.0
23552449.38453.0
23553898.58459.0
23554336.37149.0
2355514189.1820.0
2355615203.0023.0
23557114.37462.0
2355811145.60125.0
235598111.65368.0
23560118.36462.0
23561683.4632.0
23562229.33462.0
23563358.75269.0
23564570.01212.0
23565111.77462.0
23566236.00462.0
23567120.97462.0
235686121.70434.0
23569225.74462.0
23570594.08461.0

23570 rows × 3 columns

In [34]:
def rfm_fumc(x):
    level = x.apply(lambda x:'1' if x>=1 else '0')
    label=level.R+level.F+level.M
    d={
        '111':'重要价值客户',
        '011':'重要保持客户',
        '101':'重要发展客户',
        '001':'重要挽留客户',
        '110':'一般价值客户',
        '010':'一般保持客户',
        '100':'一般发展客户',
        '000':'一般挽留客户',
    }
    result=d[label]
    return result
In [35]:
rfm['label']=rfm[['R','F','M']].apply(lambda x: x-x.mean()).apply(rfm_fumc,axis=1)
In [36]:
rfm.loc[rfm.label=='重要价值客户','color']='g'
rfm.loc[~(rfm.label=='重要价值客户'),'color']='r'
rfm.plot.scatter('F','R',c=rfm.color)
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6c26a550>

In [37]:
rfm.groupby('label').sum()
Out[37]:
 FMR
label   
一般价值客户1821767.118512.0
一般保持客户4925100.777782.0
一般发展客户29915445233.286983699.0
一般挽留客户15428215075.77621894.0
重要价值客户9849147180.09286676.0
重要保持客户1055091555586.51476502.0
重要发展客户232249905.80174340.0
重要挽留客户418480466.3096009.0

从RFM分层可知,大部分用户为重要保持客户,但是这是由于极值的影响,所有RFM的划分标准应该以业务为准

  • 尽量用小部分的用户覆盖大部分的额度
  • 不要为了数据好看划分等级
In [38]:
pivoted_counts=df.pivot_table(index='ID',columns='month',values='DATE',aggfunc='count').fillna(0)
pivoted_counts
Out[38]:
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
ID                  
11.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
22.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
31.00.01.01.00.00.00.00.00.00.02.00.00.00.00.00.01.00.0
42.00.00.00.00.00.00.01.00.00.00.01.00.00.00.00.00.00.0
52.01.00.01.01.01.01.00.01.00.00.02.01.00.00.00.00.00.0
61.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
71.00.00.00.00.00.00.00.00.01.00.00.00.00.01.00.00.00.0
81.01.00.00.00.01.01.00.00.00.02.01.00.00.01.00.00.00.0
91.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.01.0
101.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
112.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.0
121.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
131.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
141.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
151.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
161.00.00.00.00.00.02.00.01.00.00.00.00.00.00.00.00.00.0
171.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
181.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
191.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.0
202.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
212.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
221.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
231.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
241.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.0
251.00.00.00.00.00.01.01.00.01.00.00.00.00.00.01.01.02.0
262.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
272.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
282.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
291.01.01.02.02.00.02.00.01.00.01.00.00.00.00.01.00.00.0
301.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
.........................................................
235410.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235420.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235430.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235440.00.01.00.01.00.00.00.00.00.00.00.01.00.00.00.00.00.0
235450.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235460.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235470.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235480.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235490.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235500.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235510.00.01.00.00.01.00.02.02.00.00.00.00.00.00.00.00.00.0
235520.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235530.00.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235540.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.0
235550.00.01.00.00.00.00.00.00.01.00.01.00.00.00.00.01.01.0
235560.00.01.00.00.01.02.00.01.00.00.00.01.00.00.00.00.01.0
235570.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235580.00.01.00.01.01.00.00.00.00.00.00.00.01.00.00.00.00.0
235590.00.01.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.0
235600.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235610.00.01.00.00.00.00.00.00.00.00.00.01.00.00.00.01.00.0
235620.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235630.00.01.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.0
235640.00.01.00.01.00.00.00.00.00.01.00.00.00.00.00.00.00.0
235650.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235660.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235670.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235680.00.01.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235690.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235700.00.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0

23570 rows × 18 columns

In [39]:
df_purchase=pivoted_counts.applymap(lambda x:1 if x>0 else 0)
df_purchase
Out[39]:
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
ID                  
1100000000000000000
2100000000000000000
3101100000010000010
4100000010001000000
5110111101001100000
6100000000000000000
7100000000100001000
8110001100011001000
9100010000000000001
10100000000000000000
11101000000000010000
12100000000000000000
13100000000000000000
14100000000000000000
15100000000000000000
16100000101000000000
17100000000000000000
18100000000000000000
19100001000000000000
20100000000000000000
21100000000000000000
22100000000000000000
23100000000000000000
24100000000000100000
25100000110100000111
26100000000000000000
27100000000000000000
28101000000000000000
29111110101010000100
30110000000000000000
.........................................................
23541001100000000000000
23542001000000000000000
23543001000000000000000
23544001010000000100000
23545001000000000000000
23546001000000000000000
23547001100000000000000
23548001000000000000000
23549001000000000000000
23550001000000000000000
23551001001011000000000
23552001100000000000000
23553001000000000000000
23554001000000000010000
23555001000000101000011
23556001001101000100001
23557001000000000000000
23558001011000000010000
23559001011000000000000
23560001000000000000000
23561001000000000100010
23562001000000000000000
23563001000000100000000
23564001010000010000000
23565001000000000000000
23566001000000000000000
23567001000000000000000
23568001100000000000000
23569001000000000000000
23570001000000000000000

23570 rows × 18 columns

In [40]:
def active_status(data):
    status=[]
    for i in range(18):
        if data[i]==0:
            if len(status)>0:
                if status[i-1]=='unreg':
                    status.append('unreg')
                else:
                    status.append('unactive')
            else:
                status.append('unreg')
        else:
            if len(status)==0:
                status.append('new')
            else:
                if status[i-1]=='unactive':
                    status.append('return')                             
                elif status[i-1]=='unreg':
                     status.append('new')
                else:
                     status.append('active')
    return status    
    
In [41]:
puchaser_status=df_purchase.apply(active_status,axis=1)
In [42]:
puchaser_status
Out[42]:
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
ID                  
1newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
2newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
3newunactivereturnactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactivereturnunactive
4newunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactiveunactive
5newactiveunactivereturnactiveactiveactiveunactivereturnunactiveunactivereturnactiveunactiveunactiveunactiveunactiveunactive
6newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
7newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactivereturnunactiveunactiveunactive
8newactiveunactiveunactiveunactivereturnactiveunactiveunactiveunactivereturnactiveunactiveunactivereturnunactiveunactiveunactive
9newunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactivereturn
10newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
11newunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactive
12newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
13newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
14newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
15newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
16newunactiveunactiveunactiveunactiveunactivereturnunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
17newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
18newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
19newunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
20newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
21newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
22newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
24newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactive
25newunactiveunactiveunactiveunactiveunactivereturnactiveunactivereturnunactiveunactiveunactiveunactiveunactivereturnactiveactive
26newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
27newunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
28newunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
29newactiveactiveactiveactiveunactivereturnunactivereturnunactivereturnunactiveunactiveunactiveunactivereturnunactiveunactive
30newactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
.........................................................
23541unregunregnewactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23542unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23543unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23544unregunregnewunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactive
23545unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23546unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23547unregunregnewactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23548unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23549unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23550unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23551unregunregnewunactiveunactivereturnunactivereturnactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23552unregunregnewactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23553unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23554unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactive
23555unregunregnewunactiveunactiveunactiveunactiveunactiveunactivereturnunactivereturnunactiveunactiveunactiveunactivereturnactive
23556unregunregnewunactiveunactivereturnactiveunactivereturnunactiveunactiveunactivereturnunactiveunactiveunactiveunactivereturn
23557unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23558unregunregnewunactivereturnactiveunactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactive
23559unregunregnewunactivereturnactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23560unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23561unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactivereturnunactive
23562unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23563unregunregnewunactiveunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23564unregunregnewunactivereturnunactiveunactiveunactiveunactiveunactivereturnunactiveunactiveunactiveunactiveunactiveunactiveunactive
23565unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23566unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23567unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23568unregunregnewactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23569unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive
23570unregunregnewunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactiveunactive

23570 rows × 18 columns

In [43]:
purchase_static_ct=puchaser_status.replace('unreg',np.nan).apply(lambda x:pd.value_counts(x))
purchase_static_ct
Out[43]:
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
activeNaN1157.016811773.0852.0747.0746.0604.0528.0532.0624.0632.0512.0472.0571.0518.0459.0446.0
new7846.08476.07248NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
returnNaNNaN5951049.01362.01592.01434.01168.01211.01307.01404.01232.01025.01079.01489.0919.01029.01060.0
unactiveNaN6689.01404620748.021356.021231.021390.021798.021831.021731.021542.021706.022033.022019.021510.022133.022082.022064.0
In [44]:
purchase_static_ct.fillna(0).T.apply(lambda x:x/x.sum(),axis=1)
Out[44]:
 activenewreturnunactive
month    
1997-01-010.0000001.0000000.0000000.000000
1997-02-010.0708860.5192990.0000000.409815
1997-03-010.0713190.3075100.0252440.595927
1997-04-010.0752230.0000000.0445060.880272
1997-05-010.0361480.0000000.0577850.906067
1997-06-010.0316930.0000000.0675430.900764
1997-07-010.0316500.0000000.0608400.907510
1997-08-010.0256260.0000000.0495550.924820
1997-09-010.0224010.0000000.0513790.926220
1997-10-010.0225710.0000000.0554520.921977
1997-11-010.0264740.0000000.0595670.913958
1997-12-010.0268140.0000000.0522700.920916
1998-01-010.0217230.0000000.0434870.934790
1998-02-010.0200250.0000000.0457790.934196
1998-03-010.0242260.0000000.0631740.912601
1998-04-010.0219770.0000000.0389900.939033
1998-05-010.0194740.0000000.0436570.936869
1998-06-010.0189220.0000000.0449720.936105

由上表可知,每月的用户消费状态变化

  • 活跃用户,持续消费的用户,对应的是消费运营的质量
  • 回流用户,之前不消费本月才消费,对应的是换回运营
  • 不活跃用户,对应的是流失
In [45]:
diff=grouped_user.apply(lambda x :x.DATE-x.DATE.shift())
In [46]:
diff.reset_index().DATE.describe()
Out[46]:
count                      46089
mean     68 days 23:22:13.567662
std      91 days 00:47:33.924168
min              0 days 00:00:00
25%             10 days 00:00:00
50%             31 days 00:00:00
75%             89 days 00:00:00
max            533 days 00:00:00
Name: DATE, dtype: object
In [47]:
(diff/np.timedelta64(1,'D')).hist(bins=20)
Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6e42ad30>

  • 订单周期呈指数分布
  • 用户的平均购买周期是68天
  • 绝大部分用户的购买周期都低于100天
In [48]:
(user_life['max']-user_life['min']).describe()
Out[48]:
count                       23570
mean     134 days 20:55:36.987696
std      180 days 13:46:43.039788
min               0 days 00:00:00
25%               0 days 00:00:00
50%               0 days 00:00:00
75%             294 days 00:00:00
max             544 days 00:00:00
dtype: object
In [49]:
((user_life['max']-user_life['min'])/np.timedelta64(1,'D')).hist(bins=40)
Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6c2622e8>

  • 用户的生命周期受只购买一次的用户影响比较厉害(可以排除)
  • 用户均消费134天, 中位数仅0天
In [50]:
u_1=((user_life['max']-user_life['min'])/np.timedelta64(1,'D')).reset_index()[0]
u_1[u_1>0].hist(bins=40)
Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6ef7b518>

4.复购率与回购率分析

  • 复购率
    • 自然月内,购买多次的用户占总用户比
  • 回购率
    • 曾经购买过的用户在某一时期内的再次购买的占比
In [51]:
pivoted_counts
Out[51]:
month1997-01-01 00:00:001997-02-01 00:00:001997-03-01 00:00:001997-04-01 00:00:001997-05-01 00:00:001997-06-01 00:00:001997-07-01 00:00:001997-08-01 00:00:001997-09-01 00:00:001997-10-01 00:00:001997-11-01 00:00:001997-12-01 00:00:001998-01-01 00:00:001998-02-01 00:00:001998-03-01 00:00:001998-04-01 00:00:001998-05-01 00:00:001998-06-01 00:00:00
ID                  
11.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
22.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
31.00.01.01.00.00.00.00.00.00.02.00.00.00.00.00.01.00.0
42.00.00.00.00.00.00.01.00.00.00.01.00.00.00.00.00.00.0
52.01.00.01.01.01.01.00.01.00.00.02.01.00.00.00.00.00.0
61.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
71.00.00.00.00.00.00.00.00.01.00.00.00.00.01.00.00.00.0
81.01.00.00.00.01.01.00.00.00.02.01.00.00.01.00.00.00.0
91.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.01.0
101.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
112.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.0
121.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
131.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
141.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
151.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
161.00.00.00.00.00.02.00.01.00.00.00.00.00.00.00.00.00.0
171.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
181.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
191.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.0
202.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
212.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
221.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
231.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
241.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.0
251.00.00.00.00.00.01.01.00.01.00.00.00.00.00.01.01.02.0
262.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
272.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
282.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
291.01.01.02.02.00.02.00.01.00.01.00.00.00.00.01.00.00.0
301.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
.........................................................
235410.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235420.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235430.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235440.00.01.00.01.00.00.00.00.00.00.00.01.00.00.00.00.00.0
235450.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235460.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235470.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235480.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235490.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235500.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235510.00.01.00.00.01.00.02.02.00.00.00.00.00.00.00.00.00.0
235520.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235530.00.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235540.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.0
235550.00.01.00.00.00.00.00.00.01.00.01.00.00.00.00.01.01.0
235560.00.01.00.00.01.02.00.01.00.00.00.01.00.00.00.00.01.0
235570.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235580.00.01.00.01.01.00.00.00.00.00.00.00.01.00.00.00.00.0
235590.00.01.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.0
235600.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235610.00.01.00.00.00.00.00.00.00.00.00.01.00.00.00.01.00.0
235620.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235630.00.01.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.0
235640.00.01.00.01.00.00.00.00.00.01.00.00.00.00.00.00.00.0
235650.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235660.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235670.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235680.00.01.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235690.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
235700.00.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0

23570 rows × 18 columns

In [52]:
(((pivoted_counts[pivoted_counts>1]).count())/((pivoted_counts[pivoted_counts>0]).count())).plot()
Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c70a66f28>


评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值