窗口函数
前后函数
- LEAD()
表user_time中字段是user_id,time(用户访问时间),求每个用户相邻两次浏览时间之差小于三分钟的次数。
查看表user_time
SELECT * FROM user_time;
LEAD函数查询当前字段的上一个值,若向上取值没有数据的时候显示为NULL
SELECT
user_id,
time,
LAG(time) OVER (PARTITION BY user_id ORDER BY time) AS prev_time
FROM
user_time;
完整代码
SELECT
user_id,
COUNT(*) AS num_visits_within_3_min_gap
FROM
(
SELECT
user_id,
time,
LAG(time) OVER (PARTITION BY user_id ORDER BY time) AS prev_time
FROM
user_time
) AS t
WHERE
TIMESTAMPDIFF(MINUTE, prev_time, time) <= 3
GROUP BY
user_id;
留存率
表user_log,有user_id, login_date,求每天用户新增数,次日留存率、7日留存率
select
first_login,
sum(
case
when date_diff = 0 then 1
else 0
end
) '当日新增',
sum(
case
when date_diff = 1 then 1
else 0
end
) '次日留存',
sum(
case
when date_diff = 6 then 1
else 0
end
) '7日留存' (
select
t2.uid,
login_date,
first_login,
datediff (login_date, first_login) date_diff
from
user_log as t2
left join (
SELECT
uid,
MIN(login_date) as first_login
FROM
user_log
GROUP BY
uid
) t1 using (uid)
order by
uid,
login_date
) t3
group by
first_login
连续登录
题目:查询连续3天登录的用户
表login_info,字段uid, login_date(登陆时间)
1.使用排序窗口函数求解
思路:
首先进行窗口排序分组,果某个用户是连续登录,那么他的active_time(登录时间)-rn(排序号)的日期应该是相等的。
SELECT *,
DATE_SUB(active_time,INTERVAL rn day) as ds
FROM(
SELECT *,
row_number() over(partition by uid order by active_time) as rn
FROM login_info) t
然后在外层做一个group by就可以了
SELECT
t.uid,
DATE_SUB(t.active_time,INTERVAL t.rn DAY) as date,
COUNT(1) as counts
FROM
(SELECT
uid,
active_time,
row_number() over(partition by uid order by active_time) as rn
FROM login_info) as t
GROUP BY t.uid,DATE_SUB(t.active_time,INTERVAL t.rn DAY)
HAVING COUNT(1)>=3
通过这个结果可以看出,用户001,从2021-09-30后的五天是活跃的(不包括2021-09-30)。如果只需要求uid,select改成select distinct uid即可。
2.使用偏移窗口函数
思路:
求3天连续登录的用户,按登陆日期降序,让用户登录时间往下偏移2个,只要偏移后的日期正好等于2天前日期,就说明该用户是连续3天登录的用户。
select
uid,
active_time,
lead(active_time,2) over(partition by uid order by active_time desc) as rn
from login_info
外层用where筛选一下,uid取重就可以了,完整代码如下:
select distinct uid
from
(select
uid,
active_time,
lead(active_time,2) over(partition by uid order by active_time desc) as rn
from login_info) t
where date_sub(active_time, INTERVAL 2 day) = t.rn