准备数据:
plant_carbon.txt 换取树苗所需碳排放量(换树表)
p001 梭梭树 17
p002 沙柳 19
p003 樟子树 146
p004 胡杨 215
user_low_carbon.txt 用户累计减少碳排放量表(用户积碳表)
u_001 2017/1/1 10
u_001 2017/1/2 150
u_001 2017/1/2 110
u_001 2017/1/2 10
u_001 2017/1/4 50
u_001 2017/1/4 10
u_001 2017/1/6 45
u_001 2017/1/6 90
u_002 2017/1/1 10
u_002 2017/1/2 150
u_002 2017/1/2 70
u_002 2017/1/3 30
u_002 2017/1/3 80
u_002 2017/1/4 150
u_002 2017/1/5 101
u_002 2017/1/6 68
u_003 2017/1/1 20
u_003 2017/1/2 10
u_003 2017/1/2 150
u_003 2017/1/3 160
u_003 2017/1/4 20
u_003 2017/1/5 120
u_003 2017/1/6 20
u_003 2017/1/7 10
u_003 2017/1/7 110
u_004 2017/1/1 110
u_004 2017/1/2 20
u_004 2017/1/2 50
u_004 2017/1/3 120
u_004 2017/1/4 30
u_004 2017/1/5 60
u_004 2017/1/6 120
u_004 2017/1/7 10
u_004 2017/1/7 120
u_005 2017/1/1 80
u_005 2017/1/2 50
u_005 2017/1/2 80
u_005 2017/1/3 180
u_005 2017/1/4 180
u_005 2017/1/4 10
u_005 2017/1/5 80
u_005 2017/1/6 280
u_005 2017/1/7 80
u_005 2017/1/7 80
u_006 2017/1/1 40
u_006 2017/1/2 40
u_006 2017/1/2 140
u_006 2017/1/3 210
u_006 2017/1/3 10
u_006 2017/1/4 40
u_006 2017/1/5 40
u_006 2017/1/6 20
u_006 2017/1/7 50
u_006 2017/1/7 240
u_007 2017/1/1 130
u_007 2017/1/2 30
u_007 2017/1/2 330
u_007 2017/1/3 30
u_007 2017/1/4 530
u_007 2017/1/5 30
u_007 2017/1/6 230
u_007 2017/1/7 130
u_007 2017/1/7 30
u_008 2017/1/1 160
u_008 2017/1/2 60
u_008 2017/1/2 60
u_008 2017/1/3 60
u_008 2017/1/4 260
u_008 2017/1/5 360
u_008 2017/1/6 160
u_008 2017/1/7 60
u_008 2017/1/7 60
u_009 2017/1/1 70
u_009 2017/1/2 70
u_009 2017/1/2 70
u_009 2017/1/3 170
u_009 2017/1/4 270
u_009 2017/1/5 70
u_009 2017/1/6 70
u_009 2017/1/7 70
u_009 2017/1/7 70
u_010 2017/1/1 90
u_010 2017/1/2 90
u_010 2017/1/2 90
u_010 2017/1/3 90
u_010 2017/1/4 90
u_010 2017/1/4 80
u_010 2017/1/5 90
u_010 2017/1/5 90
u_010 2017/1/6 190
u_010 2017/1/7 90
u_010 2017/1/7 90
u_011 2017/1/1 110
u_011 2017/1/2 100
u_011 2017/1/2 100
u_011 2017/1/3 120
u_011 2017/1/4 100
u_011 2017/1/5 100
u_011 2017/1/6 100
u_011 2017/1/7 130
u_011 2017/1/7 100
u_012 2017/1/1 10
u_012 2017/1/2 120
u_012 2017/1/2 10
u_012 2017/1/3 10
u_012 2017/1/4 50
u_012 2017/1/5 10
u_012 2017/1/6 20
u_012 2017/1/7 10
u_012 2017/1/7 10
u_013 2017/1/1 50
u_013 2017/1/2 150
u_013 2017/1/2 50
u_013 2017/1/3 150
u_013 2017/1/4 550
u_013 2017/1/5 350
u_013 2017/1/6 50
u_013 2017/1/7 20
u_013 2017/1/7 60
u_014 2017/1/1 220
u_014 2017/1/2 120
u_014 2017/1/2 20
u_014 2017/1/3 20
u_014 2017/1/4 20
u_014 2017/1/5 250
u_014 2017/1/6 120
u_014 2017/1/7 270
u_014 2017/1/7 20
u_015 2017/1/1 10
u_015 2017/1/2 20
u_015 2017/1/2 10
u_015 2017/1/3 10
u_015 2017/1/4 20
u_015 2017/1/5 70
u_015 2017/1/6 10
u_015 2017/1/7 80
u_015 2017/1/7 60
建表语句
create table user_low_carbon(
user_id String,
data_dt String,
low_carbon int
)
row format delimited fields terminated by '\t';
create table plant_carbon(
plant_id String,
plant_name String,
low_carbon int
)
row format delimited fields terminated by '\t';
导入语句:
load data local inpath '/usr/hdk/data/plant_carbon.txt' into table plant_carbon;
load data local inpath '/usr/hdk/data/user_low_carbon.txt' into table user_low_carbon;
第一题
假设2017年1月1日开始记录低碳数据,假设2017年10月1日前满足申领条件的用户都申领了一颗胡杨,其他的能量全部用来申领沙柳。
统计10月1日累计申领沙柳排名前10的用户信息,以及比后一名多领了几颗沙柳。
样式:
user_id plant_count less_count
u_002 1000 100
u_003 900 200
u_010 700 ...
- 第一步肯定是获取用户到10/1之前积累的碳排放量
select
user_id,
sum(low_carbon)
from
user_low_carbon
where
date_format(regexp_replace(data_dt,'/','-'),'yyyy-mm-dd') <= '2017/10/01'
group by
user_id
;
- 第二步是获得胡桃对应的碳排放量
select low_carbon from plant_carbon where plant_id = 'p004';
- 第三步是沙柳的
select low_carbon from plant_carbon where plant_id = 'p002';
- 把这三张表联合起来做计算
select
user_id,
floor((sum_low_carbon-t2.low_carbon)/t3.low_carbon) plant_count
from
(select
user_id,
sum(low_carbon) sum_low_carbon
from
user_low_carbon
where
date_format(regexp_replace(data_dt,'/','-'),'yyyy-mm-dd') <= '2017/10/01'
group by
user_id) t1,
(select low_carbon from plant_carbon where plant_id = 'p004') t2,
(select low_carbon from plant_carbon where plant_id = 'p002') t3
;
5.排序,并把下一行数据放在当前行,然后相减
select
user_id,
plant_count,
(plant_count-lead(t4.plant_count,1,'9999-99-99') over(order by plant_count desc)) lead_plant_count
from
(select
user_id,
floor((sum_low_carbon-t2.low_carbon)/t3.low_carbon) plant_count
from
(select
user_id,
sum(low_carbon) sum_low_carbon
from
user_low_carbon
where
date_format(regexp_replace(data_dt,'/','-'),'yyyy-mm-dd') <= '2017/10/01'
group by
user_id) t1,
(select low_carbon from plant_carbon where plant_id = 'p004') t2,
(select low_carbon from plant_carbon where plant_id = 'p002') t3)t4
limit 10
;
输出结果
u_007 66 3
u_013 63 10
u_008 53 7
u_005 46 1
u_010 45 1
u_014 44 5
u_011 39 2
u_009 37 5
u_006 32 9
u_002 23 1
第二题
查询user_local_carbon表中每日流水记录,条件为:
用户在2017年内,连续三天或以上的天数内,每天减少碳排放量都超过100g的低碳流水。
user_id data_dt low_carbon
u_002 2017
1.提取2017年单日积碳超过100g的记录
select
user_id,
data_dt,
row_number() over ( PARTITION BY user_id ORDER BY data_dt) rn
from
user_low_carbon
where
substring(data_dt,1,4) = '2017'
group by
user_id,data_dt
having
sum(low_carbon) >= 100
;
输出结果:
u_001 2017/1/2 1
u_001 2017/1/6 2
u_002 2017/1/2 1
u_002 2017/1/3 2
u_002 2017/1/4 3
u_002 2017/1/5 4
u_003 2017/1/2 1
u_003 2017/1/3 2
u_003 2017/1/5 3
u_003 2017/1/7 4
u_004 2017/1/1 1
u_004 2017/1/3 2
u_004 2017/1/6 3
u_004 2017/1/7 4
u_005 2017/1/2 1
u_005 2017/1/3 2
u_005 2017/1/4 3
u_005 2017/1/6 4
u_005 2017/1/7 5
u_006 2017/1/2 1
u_006 2017/1/3 2
u_006 2017/1/7 3
u_007 2017/1/1 1
u_007 2017/1/2 2
u_007 2017/1/4 3
u_007 2017/1/6 4
u_007 2017/1/7 5
u_008 2017/1/1 1
u_008 2017/1/2 2
u_008 2017/1/4 3
u_008 2017/1/5 4
u_008 2017/1/6 5
u_008 2017/1/7 6
u_009 2017/1/2 1
u_009 2017/1/3 2
u_009 2017/1/4 3
u_009 2017/1/7 4
u_010 2017/1/2 1
u_010 2017/1/4 2
u_010 2017/1/5 3
u_010 2017/1/6 4
u_010 2017/1/7 5
u_011 2017/1/1 1
u_011 2017/1/2 2
u_011 2017/1/3 3
u_011 2017/1/4 4
u_011 2017/1/5 5
u_011 2017/1/6 6
u_011 2017/1/7 7
u_012 2017/1/2 1
u_013 2017/1/2 1
u_013 2017/1/3 2
u_013 2017/1/4 3
u_013 2017/1/5 4
u_014 2017/1/1 1
u_014 2017/1/2 2
u_014 2017/1/5 3
u_014 2017/1/6 4
u_014 2017/1/7 5
u_015 2017/1/7 1
然后只需要判断他是否连续三天活跃。
可以看大数据SQL经典面试题系列(1) - 连续3天登录用户
SELECT
user_id,
count(1) cnt
FROM
(select
user_id,
data_dt,
row_number() over ( PARTITION BY user_id ORDER BY data_dt) rn
from
user_low_carbon
where
substring(data_dt,1,4) = '2017'
group by
user_id,data_dt
having
sum(low_carbon) >= 100
)t
GROUP BY
user_id,
date_sub(date_format(regexp_replace(data_dt,'/','-'),'yyyy-mm-dd'),rn)
HAVING
count( 1 ) >= 3
ORDER BY
user_id
;
u_002 4
u_005 3
u_008 4
u_009 3
u_010 4
u_011 7
u_013 4
u_014 3