select
sum(user_total_count),
sum(user_total_avg_age),
sum(twice_count),
sum(twice_count_avg_age)
from
(select
0 user_total_count,
0 user_total_avg_age,
count(*) twice_count,
cast(sum(age)/count(*) as decimal(10,2)) twice_count_avg_age
from
(
select
user_id,
min(age) age
from
(select
user_id,
min(age) age
from
(
select
user_id,
age,
date_sub(dt,rk) flag
from
(
select
dt,
user_id,
min(age) age,
rank() over(partition by user_id order by dt) rk
from
user_age
group by
dt,user_id
)t1
)t2
group by
user_id,flag
having
count(*)>=2)t3
group by
user_id
)t4
union all
select
count(*) user_total_count,
cast((sum(age)/count(*)) as decimal(10,1)),
0 twice_count,
0 twice_count_avg_age
from
(
select
user_id,
min(age) age
from
user_age
group by
user_id
)t5)t6;
这道题就flag哪里有点绕,我说下我的思路:
select
user_id,
age,
date_sub(dt,rk) flag
from
t1
//rk是rank()函数返回的结果,rank排名是由低到高并且不跳数,就是值相等会出现相同的排名,日期相同rank值就会相同,所以只要是日期是相邻的,它们和rk的差值就会相同,然后对结果count(*),相同值的个数累加大于2,说明它们的日期是相邻的,为活跃用户。