MySQL实战--用户行为数据分析--8小时跟做（2）

数据分析从入门到入土:з)))」∠)_

已于 2024-06-12 20:53:51 修改

阅读量671

点赞数 7

文章标签：数据分析

于 2024-06-11 22:02:40 首次发布

本文链接：https://blog.csdn.net/Faracross/article/details/139582736

版权

原视频链接：留存情况_哔哩哔哩_bilibili

4.留存情况

用户活跃分布

要查留存，先看用户活跃在哪些天，根据用户id和时间进行分组查询：

select user_id,dates 
from temp_behavior
group by user_id,dates;

自关联

(自己和自己连--起别名):方便自己和自己相减

-- 自关联+筛选 
select * from 
(select user_id,dates 
from temp_behavior
group by user_id,dates
) a
,(select user_id,dates 
from temp_behavior
group by user_id,dates
) b
where a.user_id=b.user_id and a.dates<b.dates;

前一天要在后一天前面：where a.user_id=b.user_id and a.dates<b.dates;

留存数

-- 留存数 ：当日留存（分母），次日留存，三日留存
select a.dates
,count(if(datediff(b.dates,a.dates)=0,b.user_id,null)) retention_0
,count(if(datediff(b.dates,a.dates)=1,b.user_id,null)) retention_1
,count(if(datediff(b.dates,a.dates)=3,b.user_id,null)) retention_3 from

--之前的表
(select user_id,dates 
from temp_behavior
group by user_id,dates
) a
,(select user_id,dates 
from temp_behavior
group by user_id,dates
) b
where a.user_id=b.user_id and a.dates<=b.dates
group by a.dates

DATEDIFF() 函数用于计算两个日期之间的差值，并以指定的时间单位返回该差值。

DATEDIFF(unit, start_date, end_date)

留存率

-- 留存率 
select a.dates
,count(if(datediff(b.dates,a.dates)=1,b.user_id,null))/count(if(datediff(b.dates,a.dates)=0,b.user_id,null)) retention_1

from
(select user_id,dates 
from temp_behavior
group by user_id,dates
) a
,(select user_id,dates 
from temp_behavior
group by user_id,dates
) b
where a.user_id=b.user_id and a.dates<=b.dates
group by a.dates

留存率=次日留存数/留存数：当日count()+if()的用法，null不计数。

5 时间序列分析

-- 统计日期-小时的行为
select dates,hours
,count(if(behavior_type='pv',behavior_type,null)) 'pv'
,count(if(behavior_type='cart',behavior_type,null)) 'cart'
,count(if(behavior_type='fav',behavior_type,null)) 'fav'
,count(if(behavior_type='buy',behavior_type,null)) 'buy'

from temp_behavior
group by dates,hours
order by dates,hours

-- 存储 
create table date_hour_behavior(
dates char(10),
hours char(2),
pv int,
cart int,
fav int,
buy int);

-- 结果插入 
insert into date_hour_behavior
select dates,hours
,count(if(behavior_type='pv',behavior_type,null)) 'pv'
,count(if(behavior_type='cart',behavior_type,null)) 'cart'
,count(if(behavior_type='fav',behavior_type,null)) 'fav'
,count(if(behavior_type='buy',behavior_type,null)) 'buy'

from user_behavior
group by dates,hours
order by dates,hours

select * from date_hour_behavior;

用tableau输出可视化图表

可以看到晚上22点左右是点击高峰期

创建分层结构，下钻（点列里字段前的+号）

6 用户转化率

-- 统计各类行为用户数 
select behavior_type
,count(DISTINCT user_id) user_num
from temp_behavior
group by behavior_type
order by behavior_type desc;

-- 存储 
create table behavior_user_num(
behavior_type varchar(5),
user_num int);

insert into behavior_user_num
select behavior_type
,count(DISTINCT user_id) user_num
from user_behavior
group by behavior_type
order by behavior_type desc;

select * from behavior_user_num

相除得到转化

7 行为路径分析

创建视图

CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

CREATE VIEW：创建一个新的视图。
view_name：指定所要创建的视图的名称。
AS：用于将 SELECT 查询结果作为新视图的数据来源。

这里用到创建一个虚拟的表（视图），它是从一个或多个实际表中的数据导出的结果集，但并不存储实际数据，而是根据需要实时地从基础表中调用数据，加快运算效率。

create view user_behavior_view as
select user_id,item_id
,count(if(behavior_type='pv',behavior_type,null)) 'pv'
,count(if(behavior_type='fav',behavior_type,null)) 'fav'
,count(if(behavior_type='cart',behavior_type,null)) 'cart'
,count(if(behavior_type='buy',behavior_type,null)) 'buy'
from temp_behavior
group by user_id,item_id

用户行为标准化

-- 用户行为标准化  
create view user_behavior_standard as
select user_id
,item_id
,(case when pv>0 then 1 else 0 end) 浏览
,(case when fav>0 then 1 else 0 end) 收藏
,(case when cart>0 then 1 else 0 end) 加购
,(case when buy>0 then 1 else 0 end) 购买
from user_behavior_view

用到case进行条件判断，并给其起别名

-- 路径类型 
create view user_behavior_path as
select *,
concat(浏览,收藏,加购,购买) 购买路径类型
from user_behavior_standard as a
where a.购买了>0

使用concat将将多个字符串拼接，如这里1001是指浏览后购买这个行为

-- 统计各类购买行为数量 
create view path_count as
select 购买路径类型
,count(*) 数量
from user_behavior_path
group by 购买路径类型
order by 数量 desc

按购买路径分类统计，并创建表

create table renhua(
path_type char(4),
description varchar(40));

insert into renhua 
values('0001','直接购买'),
('1001','浏览后购买'),
('0011','加购后购买'),
('1011','浏览加购后购买'),
('0101','收藏后购买'),
('1101','浏览收藏后购买'),
('0111','收藏加购后购买'),
('1111','浏览收藏加购后购买')

select * from path_count p 
join renhua r 
on p.购买路径类型=r.path_type 
order by 数量 desc

连接统计结果的表，使用join on

对原表存储

之前先是小规模计算，现在对原表计算，为复用代码则需删除之前的视图

drop view user_behavior_view
drop view user_behavior_standard
drop view user_behavior_path
drop view path_count

一直报错，找了两小时没找到pv在哪，卡住

6.12：

发现是count这里，把behavior_type改成1就好了，是数据类型的问题

小结：

这样写不容易报错：使用SUM函数结合CASE统计behavior_type字段为'pv'的记录的数量：

SELECT SUM(CASE WHEN behavior_type = 'pv' THEN 1 ELSE 0 END) AS pv_count
FROM your_table_name;

安利一个GPT，这个gpt插件免费好用,同门都在用，可以直接阅读代码

Sider:ChatGPT侧边栏

8 RFM模型

9 商品按热度分类

10 商品转化率分析

11 商品特征分析

12 数据可视化

数据分析从入门到入土:з)))」∠)_

关注

7
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
MySQL实战--用户行为数据分析--8小时跟做（2）

这里用到创建一个虚拟的表（视图），它是从一个或多个实际表中的数据导出的结果集，但并不存储实际数据，而是根据需要实时地从基础表中调用数据，加快运算效率。DATEDIFF() 函数用于计算两个日期之间的差值，并以指定的时间单位返回该差值。留存率=次日留存数/留存数：当日count()+if()的用法，null不计数。之前先是小规模计算，现在对原表计算，为复用代码则需删除之前的视图。创建分层结构，下钻（点列里字段前的+号）一直报错，找了两小时没找到pv在哪，卡住。用tableau输出可视化图表。
复制链接

扫一扫