分析导览:分析目的
1-统计不同性别的消费频次、消费金额
2-统计不同年龄段用户的消费金额
3-统计不同月份的消费金额
4-统计多次消费的用户,第一次和最后一次消费时间的间隔
5-统计不同年龄段的用户消费金额
6-统计消费的二八法则,消费top20%的用户,贡献了多少额度分析过程
(一)文件导入
文件导入具体方法见如何在mySQL中文件导入CSV文件,这里就不在赘述。
主要步骤分为建表、导入文件,为了避免遇到很多坑,可以间我前一篇文章,这里的数据量有10w条,用命令行导入很方便。
创建的两张表的字段:
user:userid, sex,birth
orderinfo: orderid,userid,ispaid,price,paidtime
(二)用户消费者行为分析
1、统计不同月份的下单人数
select CONCAT_WS('-',left(paidtime,4),mid(paidtime,6,1)) as month_,count(distinct userid) as paidorder from test.orderinfo
where ispaid='已支付'
group by left(paidtime,4),mid(paidtime,6,1);
注:这里用到字符串拼接和选取字符串的函数,concat_ws,left,mid,还可以用substring函数。
从结果中看出,三四月份的下单人数最多,5月份竟然只有6个人。
2、统计男女的消费频次
思路是先将两个表联结,求出每个id的下单次数,再在该表基础上对男女进行分组,统计。这里有个需要注意的,我们可以统计下单的男女频次,也可以统计已支付和未支付的频次。
select a.userid,b.sex,count(1)as cs from test.orderinfo a,test.user b where a.userid=b.userid and ISPAID="已支付" and b.sex is not null
group by a.userid;
select c.sex ,round(avg(c.cs),2) as 平均次数 from (select a.userid,b.sex,count(1)as cs from test.orderinfo a,test.user b where a.userid=b.userid and ISPAID="已支付" and b.sex is not null
group by a.userid,b.sex) c group by c.sex ;
以上是一年男女平均频次,以下是以月为单位男女的平均频次
select t.month,t.sex,round(avg(t.ct),2) as 频次 from (
select left(a.paidtime,6) as month,a.userid,count(a.userid) as ct,b.sex
from test.orderinfo a,test.user b where a.ISPAID='已支付' and a.userid=b.userid
and b.sex is not null and left(a.paidtime,6) !='2016/5'
group by left(a.paidtime,6),a.userid) t
group by t.month,t.sex ;
3、统计不同年龄段的消费频次
先给不同年纪用户划分年龄段,这里用到一个新函数timestampdiff。返回时间序列的最大、最小时间差,由于我的时间是字符串类型,我用现在的时间-年份字段得到年龄。
CEIL(X) 返回不小于X的最小整数值。(天花板)
FLOOR(X) 返回不大于X的最大整数值。(地板)
这里由于我的时间是字符串,所以我用case when多条件查询,
select userid,year(now())-left(birth,6) as age,
case when year(now())-left(birth,6)<=12 then '0-12岁'
when year(now())-left(birth,6)<=25 then '13-25岁'
when year(now())-left(birth,6)<=30 then '26-30岁'
when year(now())-left(birth,6)<=50 then '30-50岁'
when year(now())-left(birth,6)<=70 then '51-70岁'
else 'over70' end as al
from test.user ;
先求出每个age对应的年龄段,再求出每个年龄段的次数。
select t.al,count(t.ct) as 次数 from
(select a.userid,count(a.userid) as ct,year(now())-left(b.birth,6) as age,
case when year(now())-left(b.birth,6)<=12 then '0-12岁'
when year(now())-left(b.birth,6)<=25 then '13-25岁'
when year(now())-left(b.birth,6)<=30 then '26-30岁'
when year(now())-left(b.birth,6)<=50 then '30-50岁'
when year(now())-left(b.birth,6)<=70 then '51-70岁'
else 'over70' end as al
from test.orderinfo a,test.user b
where a.userid= b.userid and a.ispaid='已支付'
and year(now())-left(b.birth,6) !='2020'
group by a.userid) t
group by t.al;
select t.al,avg(t.ct) as 频次 from
(select a.userid,count(a.userid) as ct,year(now())-left(b.birth,6) as age,
case when year(now())-left(b.birth,6)<=12 then '0-12岁'
when year(now())-left(b.birth,6)<=25 then '13-25岁'
when year(now())-left(b.birth,6)<=30 then '26-30岁'
when year(now())-left(b.birth,6)<=50 then '30-50岁'
when year(now())-left(b.birth,6)<=70 then '51-70岁'
else 'over70' end as al
from test.orderinfo a,test.user b
where a.userid= b.userid and a.ispaid='已支付'
and year(now())-left(b.birth,6) !='2020'
group by a.userid) t
group by t.al
order by avg(t.ct) desc;
4、消费top20%的用户贡献了多少消费额度
计算每个用户消费的总额,
select userid,sum(price) as sum from test.orderinfo
where ispaid ="已支付" group by userid order by sum desc;
top20用户的数量和消费额度
select floor(count(1)*0.2) from (select userid,sum(price) as sum
from test.orderinfo where ispaid ="已支付"
group by userid order by sum desc) t;
select sum(t.sum) as top20 from
(select userid,sum(price) as sum from test.orderinfo where ispaid ="已支付"
group by userid order by sum desc limit 17129 ) t;
计算全部的消费额
用top20的消费额/消费总额=2700/3683 约为73.9%。
select a.userid,a.sum,round(a.sum/(select sum(price) as total from test.orderinfo) ,4) as 贡献率 from
(select userid,sum(price) as sum
from test.orderinfo where ispaid ="已支付"
group by userid order by sum desc) a
order by a.sum desc;
5、统计多次消费用户,第一次和最后一次的消费间隔
select t.userid,timestampdiff(day,min(t.time),max(t.time)) as timediif from
(select userid,str_to_date(concat(left(paidtime,4),'-0',mid(paidtime,6,1),'-0',mid(paidtime,8,1)),'%Y-%m-%d') as time
from test.orderinfo) t
where t.time is not null
group by t.userid
having timestampdiff(day,min(t.time),max(t.time))!= 0;
这里用到了字符串拼接函数concat(),时间间隔函数timestampdiff(),以天为单位,消费多次的用户的时间间隔
6、统计三月份消费的复购率和回购率
复购率:本月消费一次以上用户的占比
思路:对用户进行分组,用总的用户数>1的数量/总数
select count(if (t.ct>1,1,null))/count(1) as 复购率
from (select userid,count(userid) as ct from test.orderinfo where ispaid="已支付" and mid(paidtime,6,1)="3"
group by userid) t
每个月购买用户的复购率
select s.time,s.复购人数, round(s.复购人数/(select count(distinct userid) as zs from test.orderinfo
where ispaid='已支付'),2) as 复购率
from (select a.userid,a.time,count(if(a.ct>1,1,0)) as 复购人数 from
(select userid,count(userid) as ct,left(paidtime,6) as time from
test.orderinfo where ispaid='已支付'
group by left(paidtime,6),userid) a
group by a.time) s
回购率:三月购买的人,4月依旧购买。
思路:先算出每月购买的用户,分成两个表,将两个表左联结。以时间为计数求出本月购买人数和下月回购人数。
select a.t1,count(a.t1) as 本月购买人数,count(b.t2) as 下月回购人数, round(count(b.t2)/count(a.t1),3) as 回购率
from
(select userid,left(paidtime,6) as t1 from test.orderinfo
where ispaid='已支付'
group by userid,left(paidtime,6)) a
left join (
select userid,left(paidtime,6) as t2 from test.orderinfo
where ispaid='已支付'
group by userid,left(paidtime,6)) b
on a.userid =b.userid and a.t1 = concat(left(b.t2,5),mid(b.t2,6,1)-1)
group by a.t1;
总结从每个月的订单数来看,3、4月份的订单数较多,5月份只有几个。
从每个月以及全年的男女消费频次来看,男女消费频次是差不多的,差别不大。
从各年龄段的消费频次来看,70岁以上的频次最高,其次是30-50岁用户,而30-50岁用户消费的总次数最高。
根据二八法则,分析得出top20%消费的用户消费总额占总额的73%。
用户第一次消费和最后一次消费的时间间隔多数为1天,说明用户生命周期较短,回头客较少。
3、4月份的用户复购率较大,,约为50-60%,,3月购买用户在4月的回购率较高,而4月回购率就非常低了。
利用sql和现有数据得出以上分析结论,可以从结论中推导为什么4月用户回购率变少了,用户流失较大,i由于数据有限,因此还需要作进一步分析。
本次用到的新韩淑:
concat()
str_to_date() 函数,(str1,str2....,'%Y%m%d')
case when then
…… end
窗口函数rank()和dense_rank() 两者的区别
更新表
update table tablename set