题目要求:
见pdf:链接:https://pan.baidu.com/s/1YYx7-1s8t0LJYHWcnABLsw 提取码:z6h3
由于作业测试数据量非常小,所以主要是以写sql为主,暂时没有调优步骤。
题目一:
思路:这题是连续N天登陆的同类问题,按照row_number编号–>year减rownum=gid -->按gid分组这个固定思路写sql即可。
sql:
WITH tmp AS (
SELECT
team,
YEAR,
YEAR - (
row_number () over (PARTITION BY team ORDER BY YEAR)
) gid
FROM
t1
) SELECT
team
FROM
(
SELECT
team,
gid,
count(1)
FROM
tmp
GROUP BY
team,
gid
HAVING
count(1) >= 3
) tmp1
GROUP BY
team
最终结果:
题目二:
思路:首先按照id分区,时间排序,然后使用lag和lead函数,将前一行和后一行的price都显示在同一行,使用case when判断,比前后2个值都大,则是波峰,比前后2个值都小,则为波谷。这里有个坑需要注意下,因为time的字段类型是string,直接使用order by time来按时间排序,会把9:35这些时间排在最下面,所以我这边的做法是先将时间字段拼接成日期格式,然后转为时间戳,然后按时间戳来排序。
sql:
WITH tmp AS (
SELECT
id,
time,
price,
nvl (
lag (price) over (
PARTITION BY id
ORDER BY
unix_timestamp(
concat('2021-05-28 ', time, ':00')
)
),
0
) lagprice,
nvl (
lead (price) over (
PARTITION BY id
ORDER BY
unix_timestamp(
concat('2021-05-28 ', time, ':00')
)
),
0
) ledprice
FROM
t2
) SELECT
id,
time,
flag
FROM
(
SELECT
id,
time,
CASE
WHEN price > lagprice
AND price > ledprice THEN '波峰'
WHEN price < lagprice
AND price < ledprice THEN '波谷'
END flag
FROM
tmp
) tmp1
WHERE
flag IS NOT NULL
结果:
id,time,flag
sh66688,9:35,波峰
sh66688,9:50,波谷
sh66688,9:55,波峰
sh66688,10:00,波谷
sh66688,10:05,波峰
sh66688,10:15,波谷
sh66688,10:25,波峰
sh66688,10:30,波谷
sh66688,10:40,波峰
sh66688,10:55,波谷
sh66688,11:00,波峰
sh66688,11:05,波谷
sh66688,11:25,波谷
sh66688,13:00,波峰
sh66688,13:20,波谷
sh66688,13:35,波峰
sh66688,13:45,波谷
sh66688,13:55,波峰
sh66688,14:05,波谷
sh66688,14:15,波峰
sh66688,14:20,波谷
sh66688,14:25,波峰
sh66688,14:30,波谷
sh66688,14:45,波峰
sh66688,14:50,波谷
sh66688,14:55,波峰
sh88888,9:35,波峰
sh88888,9:40,波谷
sh88888,10:00,波峰
sh88888,10:10,波谷
sh88888,10:30,波峰
sh88888,10:45,波谷
sh88888,11:00,波峰
sh88888,11:05,波谷
sh88888,11:10,波峰
sh88888,11:20,波谷
sh88888,13:30,波峰
sh88888,13:40,波谷
sh88888,13:50,波峰
sh88888,13:55,波谷
sh88888,14:05,波峰
sh88888,14:10,波谷
sh88888,14:15,波峰
sh88888,14:20,波谷
sh88888,14:30,波峰
sh88888,14:35,波谷
sh88888,14:40,波峰
sh88888,14:45,波谷
sh88888,14:50,波峰
题目三(1)
思路:直接按id分组,使用max和min求出最大和最小时间,求差即是预览时长,count(browserid)即是步长。
sql:
SELECT
id,
(
UNIX_TIMESTAMP(max(dt), "yyyy/MM/dd HH:mm") - UNIX_TIMESTAMP(min(dt), "yyyy/MM/dd HH:mm")
) / 60 viewtime,
count(*) AS step
FROM
t3
GROUP BY
id;
结果:
题目三(2)
思路:
1.先使用lag函数,计算2个时间之间的差值;
2.使用case when >30分钟的计为1,其它为0;
3.使用sum和over开窗函数,按id分区计算累加,比如
id,间隔,是否大于30分钟
1,2,0
1,32,1
1,45,1
则计算结果就为
1,2,0,0
1,32,1,1
1,45,1,2
4.按id和累加结果分组,即可计算出预览时间和步长
sql:
WITH tmp AS (
SELECT
id,
dt,
nvl (
(
UNIX_TIMESTAMP(dt, 'yyyy/MM/dd HH:mm') - UNIX_TIMESTAMP(
lag (dt) over (PARTITION BY id ORDER BY dt),
'yyyy/MM/dd HH:mm'
)
) / 60,
0
) min
FROM
t3
) SELECT
id,
sum(min) viewtime,
count(1) steps
FROM
(
SELECT
id,
dt,
min,
sum (flag) over (PARTITION BY id ORDER BY dt) sumflag
FROM
(
SELECT
id,
dt,
min,
CASE
WHEN min > 30 THEN
1
ELSE
0
END flag
FROM
tmp
) tmp1
) tmp2
GROUP BY
id,
sumflag
结果:
id,viewtime,steps
022f86d4533740ad914f233cbd9c4430,51.0,8
307d9dce3b7f495ab8ad6033f8c54930,45.0,8
307d9dce3b7f495ab8ad6033f8c54930,605.0,9
32258fe7130844399859aec54b6df5ff,46.0,6
32258fe7130844399859aec54b6df5ff,116.0,7
80ea80b2e5a64cbebfaf34aa797125f0,51.0,5
934e8bee978a42c7a8dbb4cfa8af0b4f,32.0,6
934e8bee978a42c7a8dbb4cfa8af0b4f,72.0,7
95273392ab1a4579914273cdd1f3a3ae,40.0,10
95273392ab1a4579914273cdd1f3a3ae,1202.0,9
de0096ad04ec4273b0462c7da7d79653,53.0,8
de0096ad04ec4273b0462c7da7d79653,605.0,9
f5ae36c6cdda40d5954e08a2d14954a7,40.0,6