1,ER图
E是实体,实体有一组属性;R是关系。找到系统中的实体以及实体关系就可以绘制出ER图了。
例子:矩形的是实体,椭圆是属性,实体和实体时间的关系用菱形,关系也有属性,例如,学生选修课程,有成绩属性,当然如果系统需要,也可以记录选修的时间等属性信息。
1,between ‘2021-10-01’ and ‘2021-10-03’
‘2021-10-03 13:23:24’ 是不会统计到其中,因为’2021-10-01’指的是 ‘2021-10-01 00:00:00’
2,not and or in 顺序: not > and > or
3, in(a , b, c) 与 a or b or c 相同 ,但是in 的效率更高
4,distinct A,B 是对两个字段一起去重
5,count(1) 与 count(*)基本没差别,都是求表的总行数。但是效率会有影响
6,短视频用户粘性
6.1月活人数?
substr函数/date_format,月活要去重
select substr(load_dt,1,7) as '月份',count(1) as '登录人次',
count(distinct usr_id) as '月活'
from 表
group by 1
6.2 每天中哪个时间段用户数最高?
用case把时间段打标签,再计数
group by 1
select
case when hour(load_tm) between 7 and 8 or hour(load_tm) between 18 and 19 then 'commute'
when hour(load_tm) between 11 and 12 then 'lunch'
when hour(load_tm) in (22, 23, 0) then 'before_sleep'
end as time_prd
,count(distinct usr_id) as cst_dt
from
td_load_rcd where load_dt in ('2020-07-01', '2020-07-02',
'2020-07-03', '2020-07-06', '2020-07-07',
'2020-07-08', '2020-07-09', '2020-07-10')
group by 1
6.3 单日登录次数大于等于5次的用户数
group by 用户,日期 后,再计数
6.4 求留存率
6月12日的T+1日留存、6月15日的T+3日留存、6月20日的T+7日留存
1,先备份表a,用户+最小日期
2,备份表b,所有“用户+日期” 组合
3,a left join b on (用户 and 时间间隔差满足)
求n日留存率的方法:原表 a1 left join 原表 a2 on 用户id and (a1与a2间隔n天)
create view a as
select usr_id,min(load_dt) load_dt
from td_load_rcd
group by usr_id;
#对于每个用户来说,最早的登录日期
create view b as
select load_dt,usr_id
from td_load_rcd
group by load_dt,usr_id;
#把所有可能的用户+日期组合表示出来
select t0.load_dt,count(t0.usr_id) as cst_dt_0,
count(t1.usr_id) as cst_dt_1,
count(t1.usr_id)/count(t0.usr_id) as cst_dt_pct_1,
count(t2.usr_id) as cst_dt_2,
count(t2.usr_id)/count(t0.usr_id) as cst_dt_pct_2,
count(t3.usr_id) as cst_dt_3,
count(t3.usr_id)/count(t0.usr_id) as cst_dt_pct_7
from a t0
left join b t1
on t0.usr_id=t1.usr_id and t0.load_dt=date_sub(t1.load_dt,interval 1 day)
left join b t2
on t0.usr_id=t2.usr_id and t0.load_dt=date_sub(t2.load_dt,interval 3 day)
left join b t3
on t0.usr_id=t3.usr_id and t0.load_dt=date_sub(t3.load_dt,interval 7 day)
group by t0.load_dt
order by t0.load_dt;
6.5 连续登录N天的用户数
6月份连续7天登录的用户数?
先筛选6月份—窗口函数排个序rnk—日期减去序号rnk得新日期列load_dt2—按新日期列分组数数load_days— 对个数load_days> N 筛选 —统计满足用户的个数
select count(distinct usr_id)
from
(
select usr_id,load_dt2,count(1) load_days
from
(
select usr_id,load_dt,rnk,date_sub(load_dt,interval rnk day) as load_dt2
from
(
select a.usr_id,a.load_dt,row_number()over(partition by a.usr_id order by a.load_dt) rnk
from
(select usr_id,load_dt from td_load_rcd where substr(load_dt,1,7) = '2020-06'
group by usr_id,load_dt
) a
)b
)c
group by usr_id,load_dt2
having load_days >=7
)t
;