MySQL查询用户行为_用户行为分析

通过对某宝数据集的分析,本博客探讨了用户在MySQL数据库中的购物行为,包括整体用户购物情况、用户行为转化漏斗、购买率高和购买率为0的人群特征以及基于时间维度的用户行为习惯。通过数据清洗、处理和建模,揭示了用户点击、收藏、加购和购买的转化率,以及用户活跃时段和一周内用户活跃分布。最后,运用RFM模型对用户价值进行了评分。
摘要由CSDN通过智能技术生成

1、数据集,某宝数据。

下载数据后,利用kettle将文本文件导入mysql数据库。数据time从 2014-11-18 00到2014-12-18 23。包含如下字段:

68eae073a689781cf1f32ea3c8925fa2.png

item_id(产品ID),behavior_type(包含点击、收藏、加购物车、购买四种行为,分别用数字1、2、3、4表示),user_geohash(地理位置),item_category(品类ID),time(发生行为的时间)。

2、目标

整体用户购物情况

用户行为转化漏斗

购买率高和购买率为 0 的人群有什么特征

基于时间维度了解用户的行为习惯

基于RFM模型的用户分析

3、数据清洗处理

1)缺失值

user_geohash列含大量NULL值,后续不对该字段处理

2)数据一致化处理

time字段含年月日和小时,将小时单独分解为1列。time列之前新建1列time的副本date 存年月日,time列存时间。方法用到replace和substring_index(按特点分隔符,分隔后索要截取字符串)

alter table exc1 add date varchar(20) not null after item_category;

update exc1 set date = time

update exc1 set date = replace(date,date,substring_index(date,' ',1))

update exc1 set time = replace(time, time, substring_index(time, ' ', -1 ))

3)behavior_type列4个值(点击1、收藏2、加购物车3、购买4)转换为‘pv’,'fav','cart','buy'

update exc1 set behavior_type = replace(behavior_type ,1,'pv');

update exc1 set behavior_type = replace(behavior_type ,2,'fav')

update exc1 set behavior_type = replace(behavior_type ,3,'cart')

update exc1 set behavior_type = replace(behavior_type ,4,'buy')

4) 查看表结构,发现date类型不是date型,将其转换为date型

1749f3fe27bfad91725f5a37ed6c8cf8.png

alter table exc1 modify date date;

4、构建模型及分析

1)整体用户购物情况

pv总访问量

select count(behavior_type) as '总访问量' from exc1 group by behavior_type having behavior_type = 'pv';

3bc0066ae8428d1dc4173636ab64c4ea.png

日均访问量

select date, count(behavior_type) as '日均访问量' from exc1 where behavior_type = 'pv' group by date order by date ;

a01a51ec07612740a7237b060ab2dfc1.png

UV(用户总数)

select count(distinct user_id) '用户总数' from exc1;

d66c529e1f1e824e513fa4ede1d1c70f.png

有购买行为的用户数量

select count(distinct user_id) '购买用户数量' from exc1 where behavior_type = 'buy';

5e7dd48da57e14fbe61828585e0e1f53.png

用户的购物情况

create view user_behavior as

select user_id, count(behavior_type),

sum(case when behavior_type = 'pv' then 1 else 0 end) as '点击次数',

sum(case when behavior_type = 'fav' then 1 else 0 end) as '收藏次数',

sum(case when behavior_type = 'cart' then 1 else 0 end) as '加购数',

sum(case when behavior_type = 'buy' then 1 else 0 end) as '购买次数'

from exc1

group by user_id

order by count(behavior_type) desc;

01eae8deaf3a4e98b7ae5af1390f045e.png

复购率

select concat(round(sum(case when 购买次数>=2 then 1 else 0 end)/sum(case when 购买次数>0 then 1 else 0 end)*100), '%') as '复购率'

from user_behavior

0b1f790a94f589f2bb17d72d2f01a26b.png

2)用户购买行为漏斗

用户购物行为总计

select sum(点击次数) '总点击数', sum(收藏次数) '收藏总数',sum(加购数) '加购物车总数', sum(购买次数) '购买总数' from user_behavior;

2b9427e04af75166c59fe0f1e3a882e7.png

用户购买行为转换率

select CONCAT( round(sum(点击次数)*100/sum(点击次数),2),'%') as 'pv',

CONCAT( round((sum(收藏次数)+sum(加购数))*100/sum(点击次数),2),'%') as 'pv_to_favcart',

CONCAT( round(sum(购买次数)*100/sum(点击次数),2),'%') as 'pv_to_buy'

from user_behavior;

730fa17f2d8d47d36bfccf16a726a5c6.png

制作动态漏斗图:

import pandas as pd

import pyecharts as pec

dict = {'pv':100, 'pv_to_favcart':5.07,'pv_to_buy':1.04}

user = pd.DataFrame(data = dict,index= range(1))

attr = ['点击','收藏或加购物车','购买']

value = [user.ix[:,'pv'],user.ix[:,'pv_to_favcart'],user.ix[:,'pv_to_buy']]

funnel = pec.Funnel('用户行为漏斗图',width=800,height = 600,title_pos = 'left')

funnel.add(name = '用户行为情况' ,

attr = attr,

value = value,

is_label_show = True,

label_formatter ='{b}{c}%',

label_pos = 'outside',

is_legend_show = True

)

funnel.render()

6fa63a9067d4eefdfe69d926d17132e7.png

可以看出用户点击后收藏和加购物车的概率在5.04%左右,真正购买的只有1.04%,购买转化率比较低,说明后期还有很大的空间。

3)购买率高和购买率为0的人群特征分析

购买率高的人群特征(购买率降序排序):

select user_id, 点击次数,收藏次数,加购数,购买次数,

round(购买次数/点击次数*100,2) as 购买率

from user_behavior

group by user_id

order by 购买率 desc ;

fe7cd0d46bb6bd08eb5e55c3841c6bb5.png

按购买次数进行排序(购买次数降序排序)

select user_id, 点击次数,收藏次数,加购数,购买次数,

concat(round(购买次数/点击次数*100,2),'%') as 购买率

from user_behavior

group by user_id

order by 购买次数 desc;

ef375071ecef3186f2a7b3cc5c5ce7ff.png

可以发现购买率高点击次数并不高,一部分人点击2次就购买了,这部分人没有收藏也没有加入购物车,属于有目的的购物群体,缺啥买啥,一般属于理智型购物群体。

购买率为0的人群:

select user_id, 点击次数,收藏次数,加购数,购买次数,

round(购买次数/点击次数*100,2) as 购买率

from user_behavior

group by user_id

order by 购买率 asc

7239914d9a249d0493752ebaa70f7400.png

点击次数多,加购物车或收藏较多考虑有可能是为商家活动做准备。

点击次数多,购买率低或为0的客户为克制型客户,此类客户爱比较,思虑多,自制性比较强。或者说不会支付。

4)基于时间维度了解用户行为习惯

一天中用户活跃时段分布

select time, count(behavior_type),

sum(case when behavior_type = 'pv' then 1 else 0 end) as '点击次数',

sum(case when behavior_type = 'fav' then 1 else 0 end) as '收藏次数',

sum(case when behavior_type = 'cart' then 1 else 0 end) as '加购数',

sum(case when behavior_type = 'buy' then 1 else 0 end) as '购买次数'

from exc1

group by time

order by time

db2f9f947a6308a1600e56593a5761da.png

mysql数据导出为csv表:

select time, count(behavior_type),

sum(case when behavior_type = 'pv' then 1 else 0 end) as '点击次数',

sum(case when behavior_type = 'fav' then 1 else 0 end) as '收藏次数',

sum(case when behavior_type = 'cart' then 1 else 0 end) as '加购数',

sum(case when behavior_type = 'buy' then 1 else 0 end) as '购买次数'

from exc1

group by time

order by time

into outfile 'E:/Pro/users.csv' fields terminated by ','enclosed by '"'lines terminated by '\r\n';

绘制双轴图:

import pandas as pd

import pyecharts as pec

names = ['time','count(behavior_type)','点击次数','收藏次数','加购数','购买次数']

data = pd.read_csv('E:/Pro/users.csv',header=None,names = names)

attr = ["{}时".format(i) for i in range(24)]

bar = pec.Bar('用户行为-时间图',width =1500,height= 600)

bar.add('用户行为总数量',attr, data.iloc[:,1],is_label_show=False)

bar.add('点击总量',attr, data.iloc[:,2])

line = pec.Line()

for i in range(3,len(names)):

line.add(names[i],attr,data.iloc[:,i])

overlap = pec.Overlap()

overlap.add(bar)

overlap.add(line, yaxis_index=1, is_add_yaxis = True)

overlap.show_config()

overlap.render('./times.html')

b338ff341b1c2c7d5b1e104bcad8306c.png

由图形可以看出每天0-5点用户活跃度保存降低状态,4-5点至最低值,6-10点用户活跃度快速上升,10-18点活跃度保存平稳状态,17-23点活跃度持续上升,21-22时到达最高峰,23时有所下降。与用户的运营行为可以参与用户活跃时段进行。

一周中用户活跃分布

select date_format(date,'%W') as weeks, count(behavior_type),

sum(case when behavior_type = 'pv' then 1 else 0 end) as '点击次数',

sum(case when behavior_type = 'fav' then 1 else 0 end) as '收藏次数',

sum(case when behavior_type = 'cart' then 1 else 0 end) as '加购数',

sum(case when behavior_type = 'buy' then 1 else 0 end) as '购买次数'

from exc1

where date between '2014-11-23' and '2014-12-13'

group by weeks

order by weeks

955d1375a5bade2b570537d077ca7167.png

同样绘制双轴图:

127aa2a7609864fd26bb79f42c28d96e.png

用户活跃度到周五上升比较明显,周五到周末小幅度下降,到周日又有慢慢的回升趋势。星期4到星期5用户活跃度激增,考虑可能是双12带来的影响。

5) 基于RFM模型找出有价值的客户

R-Recency: 最近一次购买时间

F-Frequency: 消费频率

M-Money:消费金额

数据集没有消费金额,对最近一次购买时间和消费频率进行评分

针对R-Recency评分(间隔天数越少客户价值越大,间隔天数升序排序)

select a.* ,

(@r:=@r+1) as recent_rank from (

select user_id, datediff('2014-12-19', max(date)) as recent from exc1

where behavior_type = 'buy'

group by user_id

order by recent

)a,(select @r:=0)b ;

254d87309582a70642163d2e6ae58c24.png

针对消费频率(F-Frequency)(购买频率越大,客户价值越大)

select a.* ,

(@r2:=@r2+1) as freq_rank from (

select user_id, count(behavior_type) as frequency from exc1

where behavior_type = 'buy'

group by user_id

order by frequency desc

)a,(select @r2:=0)b

22fdca4e89cff01340098985f07f37b0.png

联合合并2者,并加入评分关系

select m.user_id , n.frequency , recent_rank, freq_rank,

concat(

case when recent_rank <= (4330)/4 then '4'

when recent_rank > (4330)/4 and recent_rank <= (4330)/2 then '3'

when recent_rank > (4330)/2 and recent_rank <= (4330)/4*3 then '2'

else '1' end ,

case when freq_rank <= (4330)/4 then '4'

when freq_rank > (4330)/4 and freq_rank <= (4330)/2 then '3'

when freq_rank > (4330)/2 and freq_rank <= (4330)/4*3 then '2'

else '1' end

) as user_value

from (

select a.*,(@r1:=@r1+1) as recent_rank from (

select user_id, datediff('2014-12-19',max(date)) as recent

from exc1

where behavior_type = 'buy'

group by user_id order by recent

) a, (select @r1:=0 )as b) m ,

( select a.* , (@r2:=@r2+1) as freq_rank from (

select user_id, count(behavior_type) as frequency

from exc1

where behavior_type = 'buy'

group by user_id order by frequency desc

)a ,(select @r2:=0) as b) as n

where m.user_id = n.user_id;

1cf5b8f18dcf71b2a6d7bfb09828f365.png

通过得分可以看出user_value 为‘41’的用户为关注频次比较高,购买能力不足的用户,可以选择适当促销打折或其他捆绑销售来促进客户下单

而user_value 为‘14’的用户这类关注度忠诚度不高,购买力强的客户需要关注其购物习性做精准化营销。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值