描述:550. 游戏玩法分析
编写解决方案,报告在首次登录的第二天再次登录的玩家的 比率,四舍五入到小数点后两位。换句话说,你需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。
数据准备:
Create table If Not Exists Activity (player_id int, device_id int, event_date date, games_played int)
Truncate table Activity
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-03-01', '5')
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-03-02', '6')
insert into Activity (player_id, device_id, event_date, games_played) values ('2', '3', '2017-06-25', '1')
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '1', '2016-03-02', '0')
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '4', '2018-07-03', '5')
分析:
①首先要找到各用户首次登陆的时间
select player_id, event_date, min(event_date) over (partition by player_id order by event_date) first from Activity②如何找从首次登录日期开始至少连续两天登录的玩家?可以用datediff(event_date,first)=1进行筛选
select count(distinct player_id) num from t1 where datediff(event_date, first) = 1③最后求出总人数后,对数据进行整理
select round(num / (select count(distinct player_id) from Activity), 2) fraction from t2
代码:
with t1 as (select player_id,
event_date,
min(event_date) over (partition by player_id order by event_date) first
from Activity)
, t2 as (select count(player_id) num
from t1
where datediff(event_date, first) = 1)
select round(num / (select count(distinct player_id) from Activity), 2) fraction
from t2;
总结:
筛选从首次登录日期开始至少连续两天登录的玩家可能难想到,对首天登录的数据难以处理可以直接找首次登陆后第二天登录的
用datediff(event_date,first)=1解决
描述:585.2016年的投资
编写解决方案报告 2016 年 (
tiv_2016
) 所有满足下述条件的投保人的投保金额之和:
- 他在 2015 年的投保额 (
tiv_2015
) 至少跟一个其他投保人在 2015 年的投保额相同。- 他所在的城市必须与其他投保人都不同(也就是说 (
lat, lon
) 不能跟其他任何一个投保人完全相同)。
tiv_2016
四舍五入的 两位小数 。
数据准备:
Create Table If Not Exists Insurance (pid int, tiv_2015 float, tiv_2016 float, lat float, lon float)
Truncate table Insurance
insert into Insurance (pid, tiv_2015, tiv_2016, lat, lon) values ('1', '10', '5', '10', '10')
insert into Insurance (pid, tiv_2015, tiv_2016, lat, lon) values ('2', '20', '20', '20', '20')
insert into Insurance (pid, tiv_2015, tiv_2016, lat, lon) values ('3', '10', '30', '20', '20')
insert into Insurance (pid, tiv_2015, tiv_2016, lat, lon) values ('4', '10', '40', '40', '40')
分析:
①分析第一个小问题:至少跟另一个人相同说明tiv_2015这个字段的值不是唯一的,也就是说根据tiv_2015分组聚合那么count的结果得大于1
②分析第二个问题:(lat,lon)组合不能跟别人相同,那么根据(lat,lon)分组的count的结果要为1
select *,
count(tiv_2015) over (partition by tiv_2015) t1,
count(pid) over (partition by lat,lon) t2
from Insurance③完善数据:根据tiv_2016求和,round(sum(tiv_2016),2)保留两位小数
代码:
with t1 as (select *,
count(tiv_2015) over (partition by tiv_2015) t1,
count(pid) over (partition by lat,lon) t2
from Insurance)
select round(sum(tiv_2016),2)tiv_2016 from t1
where t1 >1 and t2 =1;
总结:
积累思路:通过筛选count(),可以求一个值与本行其他列相同的或不同的结果