某音短视频
SQL1 各个视频的平均完播率
先理解题目意思:有播放记录的视频的id号和其完播率。条件是:2021年和根据其完播率降序排序。
拆解复杂指标:这里需要拆解的是完播率这个指标
注意:记得加上题目的限制条件
SELECT a.video_id ,
round(sum(if(end_time - start_time >= duration, 1, 0))/count(start_time ),3) as avg_comp_play_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
on a.video_id = b. video_id
WHERE year(start_time) = 2021
GROUP BY a.video_id
ORDER BY avg_comp_play_rate DESC;
SQL2 平均播放进度大于60%的视频类别
平均播放进度大于60%的视频类别
明确题意:
计算各类视频的平均播放进度,将进度大于60%的类别输出
问题分解:
关联用户-视频互动记录和短视频信息表:JOIN tb_video_info USING(video_id);
按视频类别分组:GROUP BY tag
计算每个类别的平均播放进度:
播放进度=播放时长÷视频时长*100%
播放时长=TIMESTAMPDIFF(SECOND, start_time, end_time);特殊情况:播放时长大于视频时长时,播放进度为100%(加个IF判断)
平均进度=AVG(每个进度)
结果保留2位小数:ROUND(x, 2)
百分比格式化:CONCAT(x, ‘%’)
计算公式:
ROUND(AVG(
IF(TIMESTAMPDIFF(SECOND, start_time, end_time) > duration, 1,
TIMESTAMPDIFF(SECOND, start_time, end_time) / duration)
) * 100, 2) as avg_play_progress
筛选播放进度>60%的视频类别:HAVING avg_play_progress > 60
细节问题:
表头重命名:as
按播放进度倒序排序:ORDER BY avg_play_progress DESC;
select tag,concat(avg_play_progress,"%") as avg_play_progress
from(
select tag,
round(avg(if(timestampdiff(second,start_time,end_time)>duration,1,
timestampdiff(second,start_time,end_time)/duration))*100,
2) as avg_play_progress
from tb_user_video_log
join tb_video_info using(video_id)
group by tag
having avg_play_progress>60
order by avg_play_progress desc
)as t_progress
SQL3 每类视频近一个月的转发量/率
首先是题意,视频播放日期start_time和整体的日期最大值之差小于30的,叫做“视频在有用户互动的最近一个月”。 第二,mark一下,上面要写成(SELECT MAX(start_time) FROM tb_user_video_log),而不能直接写MAX(start_time)。where后面不能跟集函数。顺便说一句,having后面倒是常跟集函数。
1.有交互的视频的最近一个月(就是有播放量的视频的最大日期 减去 30天)
2.每类视频在有用户互动的最近一个月(并不是对于各类视频计算最大值,而是整体的日期最大值)
select tag,sum(if_retweet),round(sum(if_retweet)/count(*),3) retweet_rate
from tb_user_video_log a
left join tb_video_info b
on a.video_id =b.video_id
where timestampdiff(day,start_time,(select max(start_time) from tb_user_video_log))<30
group by tag
order by retweet_rate desc
SQL4 每个创作者每月的涨粉率及截止当前的总粉丝量
第一个需要解决的地方是每个月状态2相当于掉粉,状态1相当于涨粉,这里不能用if去做,因为有多个状态,所以用case when去处理。
第二个需要解决的就是,我们需要计算每个月截止当前的粉丝量,这里自然地想到用窗口函数去处理即可,这里窗口函数需要去partition by author如果这样处理的话,如果有多个作者就会混乱。
SELECT
author,
date_format(start_time,'%Y-%m') month,
round(sum(case when if_follow=1 then 1
when if_follow=2 then -1
else 0 end)/count(author),3) fans_growth_rate,
sum(sum(case when if_follow=1 then 1
when if_follow=2 then -1
else 0 end)) over(partition by author order by date_format(start_time,'%Y-%m')) total_fans
FROM tb_user_video_log log
left join tb_video_info info on log.video_id=info.video_id
where year(start_time)=2021
group by author,month
order by author,total_fans
请问这个为啥要sum两次呢?只在窗口函数中放sum为啥不行?
因为一个是单月统计 一个是截止到当前的汇总
窗口函数是把单月新增累加一起
用户增长场景(某度信息流)
SQL7 2021年11月每天的人均浏览文章时长
select
date(in_time) dt,
round(sum(timestampdiff(second,in_time,out_time))/count(distinct uid),1) avg_lensec
from tb_user_log
where date_format(in_time,"%Y-%m") ="2021-11" and artical_id!=0
group by dt
order by avg_lensec
SQL8 每篇文章同一时刻最大在看人数
select artical_id, max(cnt) as ax
from
(select artical_id,
sum(Mark) over (partition by artical_id order by Time, Mark desc) as cnt
from
(select artical_id, in_time as Time , 1 Mark from tb_user_log where artical_id != 0
union all
select artical_id, out_time as Time , -1 Mark from tb_user_log where artical_id != 0) as a
)as b
group by artical_id
order by ax desc
SQL9 2021年11月每天新用户的次日留存率
流程:
先查询出每个用户第一次登陆时间(最小登陆时间)–每天新用户表
因为涉及到跨天活跃,所以要进行并集操作,将登录时间和登出时间取并集,这里union会去重–用户活跃表
将每天新用户表和用户活跃表左连接,只有是同一用户并且该用户第2天依旧登陆才会保留整个记录,否则右表记录为空
得到每天新用户第二天是否登陆表后,开始计算每天的次日留存率:根据日期分组计算,次日活跃用户个数/当天新用户个数
select t1.dt,round(count(t2.uid)/count(t1.uid),2) uv_left_rate
from(
select uid,
min(date(in_time)) dt
from tb_user_log
group by uid) t1 -- 每天用户表
left join
(select uid,date(in_time) dt
from tb_user_log
union
select uid,date(out_time) dt
from tb_user_log) t2 -- 用户活跃表
on t1.uid = t2.uid and t1.dt = date_sub(t2.dt,INTERVAL 1 day)
where date_format(t1.dt,"%Y-%m") ='2021-11'
group by t1.dt
order by t1.dt
电商场景(某东商城)
SQL13 计算商城中2021年每月的GMV
明确题意:
统计GMV大于10w的每月GMV
问题分解:
筛选满足条件的记录:
退款的金额不算(付款的记录还在,已算过一次):where status != 2
2021年的记录:and YEAR(event_time) = 2021
按月份分组:group by DATE_FORMAT(event_time, “%Y-%m”)
计算GMV:(sum(total_amount) as GMV
保留整数:ROUND(x, 0)
筛选GMV大于10w的分组:having GMV > 100000
另外有一个问题
group by 执行顺序不是在select 之前吗,为什么能直接用month啊
答:
mysql做了查询增强,别的不可以
select date_format(event_time,"%Y-%m") as month,
sum(total_amount) as GMV
from tb_order_overall
where year(event_time) =2021 and (status!=2)
group by month
having GMV>100000
order by GMV
SQL14 统计2021年10月每个退货率不大于0.5的商品的各项指标
明确题意:
统计2021年10月每个有展示记录的退货率不大于0.5商品的各项指标:
商品点展比=点击数÷展示数;加购率=加购数÷点击数;成单率=付款数÷加购数;退货率=退款数÷付款数,当分母为0时整体结果记为0。
结果中各项指标保留3位小数,并按商品ID升序排序。
问题分解:
计算各个维度的计数(生成子表t_product_index_cnt)
筛选时间窗内的记录:where DATE_FORMAT(event_time, ‘%Y%m’) = ‘202110’
按商品ID分组:group by product_id
统计各种计数:
展示数(每条记录就是一次展示):COUNT(1) as show_cnt
点击数:sum(if_click) as click_cnt
加购数:sum(if_cart) as cart_cnt
付款数:sum(if_payment) as payment_cnt
退款数:sum(if_refund) as refund_cnt
计算各种指标率(除了展示数其他均可能为0,要特殊处理!):
点击率:click_cnt/show_cnt as ctr
加购率:IF(click_cnt>0, cart_cnt/click_cnt, 0) as cart_rate
付款率:IF(cart_cnt>0, payment_cnt/cart_cnt, 0) as payment_rate
退款率:IF(payment_cnt>0, refund_cnt/payment_cnt, 0) as refund_rate
都保留3位小数:ROUND(x, 3)
筛选退款率不大于0.5的商品,需注意分母可能为0:where payment_cnt = 0 or refund_rate <= 0.5
select product_id,round(click_cnt/show_cnt,3) as ctr,
round(if(click_cnt>0,cart_cnt/click_cnt,0),3) as cart_rate,
round(if(cart_cnt>0,payment_cnt/cart_cnt,0),3) as payment_rate,
round(if(payment_cnt>0,refund_cnt/payment_cnt,0),3) as refund_rate
from(
select product_id,count(1) as show_cnt,
sum(if_click) as click_cnt,
sum(if_cart) as cart_cnt,
sum(if_payment) as payment_cnt,
sum(if_refund) as refund_cnt
from tb_user_event
where date_format(event_time,'%Y-%m') ='2021-10'
group by product_id
)as tb_user_e
where payment_cnt =0 or refund_cnt/payment_cnt<=0.5
order by product_id
SQL15 某店铺的各商品毛利率及店铺整体毛利率
select product_id,concat(profit_rate,"%") as profit_rate
from(
select ifnull(product_id,'店铺汇总') as product_id,
round(100*(1-sum(in_price*cnt)/sum(price*cnt)),1) as profit_rate
from(
select product_id,price,cnt,in_price
from tb_order_detail
join tb_order_overall using(order_id)
join tb_product_info using(product_id)
where shop_id =901 and date(event_time) >="2021-10-01"
)as t_product_in_each_order
group by product_id
with rollup
having profit_rate>24.9 or product_id is null
order by product_id
)as t1
SQL16 零食类商品中复购率top3高的商品
select product_id,
round(sum(repurchase)/count(repurchase),3) as repurchase_rate
from(
select uid,product_id,if(count(event_time)>1,1,0) as repurchase
from tb_order_detail
join tb_order_overall using(order_id)
join tb_product_info using(product_id)
where tag = "零食" and event_time>= (
select date_sub(max(event_time),INTERVAL 89 DAY)
from tb_order_overall
)
group by uid,product_id
)as tb
group by product_id
order by repurchase_rate desc,product_id
limit 3
出行场景(某滴打车)
SQL19 2021年国庆在北京接单3次及以上的司机统计
select "北京" as city,round(avg(order_num),3) as avg_order_num,
round(avg(income),3) as avg_income
from(
select driver_id,count(order_id) as order_num,
sum(fare) as income
from tb_get_car_order
join tb_get_car_record using(order_id)
where city ="北京"
and date_format(order_time,"%Y-%m-%d") between '2021-10-01' and '2021-10-07'
group by driver_id
having count(order_id)>=3
) as t
SQL20 有取消订单记录的司机平均评分
select ifnull(driver_id,"总体") as driver_id,
round(avg(grade),1) as avg_grade
from tb_get_car_order
where driver_id in (
select driver_id
from tb_get_car_order
where date_format(order_time,"%Y-%m")='2021-10' and isnull(fare)
) and not isnull(grade)
group by driver_id
with rollup
某宝店铺分析(电商模式)
SQL25 某宝店铺的SPU数量
select style_id,
count(distinct item_id) SPU_num
from product_tb
group by style_id
order by SPU_num desc
SQL26 某宝店铺的实际销售额与客单价
销售额:SUM(sales_price)
客单价:SUM(sales_price)/COUNT(DISTINCT user_id)
select sum(sales_price) sales_total,
round(sum(sales_price)/count(distinct user_id),2) pre_trans from sales_tb
牛客直播课分析(在线教育行业)
SQL30 牛客直播转换率
SELECT course_id,
course_name,
round(sum(if_sign)/ sum(if_vw)*100,2) as "sign_rate(%)"
from course_tb
join behavior_tb
using(course_id)
group by course_id,course_name
order by course_id
某乎问答(内容行业)
SQL35 某乎问答11月份日人均回答量
select answer_date,
round(count(issue_id)/count(distinct author_id),2) per_num
from answer_tb
where month(answer_date) = 11
group by answer_date