impala-1
分析函数
over
- row_number():
select goods_price,row_number() over(order by goods_price desc) as id from h2_td_url_price limit 10; - max(x):
select spread_name, goods_price, max(goods_price) over (partition by spread_name) from h2_td_url_price order by spread_name; - 窗口子句受 AVG()、COUNT()、FIRST_VALUE()、LAST_VALUE() 和 SUM() 函数支持。对于 MAX() 和 MIN(),允许窗口子句的前提是开始边界为UNBOUNDED PRECEDING。
ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ]的使用:
比如我想看收入的每天收入的5日累积变化可以像下面这么写:
select dt as '日期', price as '收入',
sum(price) over (partition by spread_name order by dt rows between current row and 1 following) as '+1收入',
sum(price) over (partition by spread_name order by dt rows between current row and 2 following) as '+2收入',
sum(price) over (partition by spread_name order by dt rows between current row and 3 following) as '+3收入',
sum(price) over (partition by spread_name order by dt rows between current row and 4 following) as '+4收入',
sum(price) over (partition by spread_name order by dt rows between current row and 5 following) as '+5收入'
from (
select dt,spread_name,sum(goods_price) price
from h2_td_url_price
where spread_name='广点通'
group by dt,spread_name
) t
order by dt;
结果
+------------+-------+--------+--------+--------+--------+--------+
| 日期 | 收入 | +1收入 | +2收入 | +3收入 | +4收入 | +5收入 |
+------------+-------+--------+--------+--------+--------+--------+
| 2015-08-27 | 290 | 3180 | 7818 | 13167 | 19627 | 28918 |
| 2015-08-28 | 2890 | 7528 | 12877 | 19337 | 28628 | 37639 |
| 2015-08-29 | 4638 | 9987 | 16447 | 25738 | 34749 | 40750 |
| 2015-08-30 | 5349 | 11809 | 21100 | 30111 | 36112 | 41393 |
| 2015-08-31 | 6460 | 15751 | 24762 | 30763 | 36044 | 48733 |
| 2015-09-01 | 9291 | 18302 | 24303 | 29584 | 42273 | 51612 |
| 2015-09-02 | 9011 | 15012 | 20293 | 32982 | 42321 | 49526 |
| 2015-09-03 | 6001 | 11282 | 23971 | 33310 | 40515 | 53029 |
| 2015-09-04 | 5281 | 17970 | 27309 | 34514 | 47028 | 57571 |
| 2015-09-05 | 12689 | 22028 | 29233 | 41747 | 52290 | 52290 |
| 2015-09-06 | 9339 | 16544 | 29058 | 39601 | 39601 | 39601 |
| 2015-09-07 | 7205 | 19719 | 30262 | 30262 | 30262 | 30262 |
| 2015-09-08 | 12514 | 23057 | 23057 | 23057 | 23057 | 23057 |
| 2015-09-09 | 10543 | 10543 | 10543 | 10543 | 10543 | 10543 |
+------------+-------+--------+--------+--------+--------+--------+
select dt as '日期', price as '收入',
sum(price) over (partition by spread_name order by dt range between unbounded preceding and current row) as '累积当今收入',
sum(price) over (partition by spread_name order by dt range between unbounded preceding and unbounded following) as '累积收入',
sum(price) over (partition by spread_name order by dt range between current row and unbounded following ) as '当日至今收入'
from (
select dt,spread_name,sum(goods_price) price
from h2_td_url_price
where spread_name='广点通'
group by dt,spread_name
) t
order by dt;
+------------+-------+--------------+----------+--------------+
| 日期 | 收入 | 累积当今收入 | 累积收入 | 当日至今收入 |
+------------+-------+--------------+----------+--------------+
| 2015-08-27 | 290 | 290 | 101501 | 101501 |
| 2015-08-28 | 2890 | 3180 | 101501 | 101211 |
| 2015-08-29 | 4638 | 7818 | 101501 | 98321 |
| 2015-08-30 | 5349 | 13167 | 101501 | 93683 |
| 2015-08-31 | 6460 | 19627 | 101501 | 88334 |
| 2015-09-01 | 9291 | 28918 | 101501 | 81874 |
| 2015-09-02 | 9011 | 37929 | 101501 | 72583 |
| 2015-09-03 | 6001 | 43930 | 101501 | 63572 |
| 2015-09-04 | 5281 | 49211 | 101501 | 57571 |
| 2015-09-05 | 12689 | 61900 | 101501 | 52290 |
| 2015-09-06 | 9339 | 71239 | 101501 | 39601 |
| 2015-09-07 | 7205 | 78444 | 101501 | 30262 |
| 2015-09-08 | 12514 | 90958 | 101501 | 23057 |
| 2015-09-09 | 10543 | 101501 | 101501 | 10543 |
+------------+-------+--------------+----------+--------------+
rank和dense_rank的区别
dense_rank
返回整数的升序顺序,从 1 开始。输出序列为 ORDER BY 表达式的重复值生成重复整数。为”连接的” 输入值生成重复输出值后,函数通过下一个更高的整数来继续序列。
因此,在输入包含重复值时,序列将包含重复值但无间隔。启动通过 PARTITIONED BY 子句生成的每个组的序列。
rank
返回整数的升序顺序,从 1 开始。输出序列为 ORDER BY 表达式的重复值生成重复整数。为”连接的” 输入值生成重复输出值后,函数按连接值的数量增加序列。
因此,在输入包含重复值时,序列将包含重复和间隔。启动通过 PARTITIONED BY 子句生成的每个组的序列。
举个例子:
select spread_name,
row_number() over(order by spread_name) id,
rank() over (order by spread_name) rank_id,
dense_rank() over (order by spread_name) dense_id
from h2_td_url_price group by spread_name,appkey
order by spread_name;
+---------------------+----+---------+----------+
| spread_name | id | rank_id | dense_id |
+---------------------+----+---------+----------+
| adSage | 1 | 1 | 1 |
| adwords | 2 | 2 | 2 |
| appTao | 3 | 3 | 3 |
| cocounion | 4 | 4 | 4 |
| gdt | 5 | 5 | 5 |
| tendcloud | 6 | 6 | 6 |
| 广点通 | 7 | 7 | 7 |
| 广点通 | 8 | 7 | 7 |
| 搜狗-无线专区0316 | 9 | 9 | 8 |
| 搜狗-无线关键字0316 | 10 | 10 | 9 |
| 百度-无线专区0316 | 11 | 11 | 10 |
| 百度-无线关键字0316 | 12 | 12 | 11 |
| 限时免费大全 | 13 | 13 | 12 |
+---------------------+----+---------+----------+
lead和leg的用法
select dt,sum(goods_price) as '收入',
lead(sum(goods_price),1) over (order by dt) as '加一日收入',
lag(sum(goods_price),1) over (order by dt) as '减一日收入'
from h2_td_url_price
group by dt;
+------------+-------+------------+------------+
| dt | 收入 | 加一日收入 | 减一日收入 |
+------------+-------+------------+------------+
| 2015-08-27 | 985 | 3954 | NULL |
| 2015-08-28 | 3954 | 7034 | 985 |
| 2015-08-29 | 7034 | 6281 | 3954 |
| 2015-08-30 | 6281 | 7199 | 7034 |
| 2015-08-31 | 7199 | 11874 | 6281 |
| 2015-09-01 | 11874 | 9637 | 7199 |
| 2015-09-02 | 9637 | 7277 | 11874 |
| 2015-09-03 | 7277 | 7652 | 9637 |
| 2015-09-04 | 7652 | 13592 | 7277 |
| 2015-09-05 | 13592 | 11036 | 7652 |
| 2015-09-06 | 11036 | 9399 | 13592 |
| 2015-09-07 | 9399 | 13299 | 11036 |
| 2015-09-08 | 13299 | 11928 | 9399 |
| 2015-09-09 | 11928 | NULL | 13299 |
+------------+-------+------------+------------+