Hive 提供了两种定义窗口帧的形式:
ROWS
和RANGE
。两种类型都需要配置上界和下界。例如,ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
表示选择分区起始记录到当前记录的所有行;SUM(close) RANGE BETWEEN 100 PRECEDING AND 200 FOLLOWING
则通过 字段差值 来进行选择。如当前行的close
字段值是200
,那么这个窗口帧的定义就会选择分区中close
字段值落在100
至400
区间的记录。以下是所有可能的窗口帧定义组合。如果没有定义窗口帧,则默认为RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
。只能运用在max、min、avg、count、sum、FIRST_VALUE、LAST_VALUE这几个窗口函数上
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING range between 3 PRECEDING and 11 FOLLOWING
hive> select id
> ,score
> ,clazz
> ,department
> ,row_number() over (partition by clazz order by score desc) as rn_rk
> ,dense_rank() over (partition by clazz order by score desc) as dense_rk
> ,rank() over (partition by clazz order by score desc) as rk
> ,percent_rank() over (partition by clazz order by score desc) as percent_rk
> ,round(cume_dist() over (partition by clazz order by score desc),3) as cume_rk
> ,NTILE(3) over (partition by clazz order by score desc) as ntile_num
> ,max(score) over (partition by clazz order by score desc ) as max_p
> from new_score;
Query ID = root_20220407142626_ff455662-731f-4d77-8ece-c23aefd920ff
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1649211406017_0030, Tracking URL = http://master:8088/proxy/application_1649211406017_0030/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0030
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-04-07 14:26:34,722 Stage-1 map = 0%, reduce = 0%
2022-04-07 14:26:39,946 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.78 sec
2022-04-07 14:26:46,184 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.18 sec
MapReduce Total cumulative CPU time: 2 seconds 180 msec
Ended Job = job_1649211406017_0030
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.18 sec HDFS Read: 13572 HDFS Write: 946 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 180 msec
OK
114 94 class1 department1 1 1 1 0.0 0.273 1 94
214 94 class1 department2 2 1 1 0.0 0.273 1 94
213 94 class1 department2 3 1 1 0.0 0.273 1 94
211 93 class1 department2 4 2 4 0.3 0.455 1 94
115 93 class1 department1 5 2 4 0.3 0.455 2 94
212 83 class1 department2 6 3 6 0.5 0.545 2 94
215 82 class1 department2 7 4 7 0.6 0.636 2 94
112 80 class1 department1 8 5 8 0.7 0.727 2 94
113 74 class1 department1 9 6 9 0.8 0.909 3 94
216 74 class1 department2 10 6 9 0.8 0.909 3 94
111 69 class1 department1 11 7 11 1.0 1.0 3 94
221 99 class2 department2 1 1 1 0.0 0.111 1 99
122 86 class2 department1 2 2 2 0.125 0.222 1 99
225 85 class2 department2 3 3 3 0.25 0.333 1 99
224 80 class2 department2 4 4 4 0.375 0.444 2 99
123 78 class2 department1 5 5 5 0.5 0.667 2 99
222 78 class2 department2 6 5 5 0.5 0.667 2 99
121 74 class2 department1 7 6 7 0.75 0.889 3 99
223 74 class2 department2 8 6 7 0.75 0.889 3 99
124 70 class2 department1 9 7 9 1.0 1.0 3 99
Time taken: 21.295 seconds, Fetched: 20 row(s)
hive> select id
> ,score
> ,clazz
> ,department
> ,row_number() over (partition by clazz order by score desc) as rn_rk
> ,dense_rank() over (partition by clazz order by score desc) as dense_rk
> ,rank() over (partition by clazz order by score desc) as rk
> ,percent_rank() over (partition by clazz order by score desc) as percent_rk
> ,round(cume_dist() over (partition by clazz order by score desc),3) as cume_rk
> ,NTILE(3) over (partition by clazz order by score desc) as ntile_num
> ,max(score) over (partition by clazz order by score desc range between 3 PRECEDING and 11 FOLLOWING) as max_p
> from new_score;
Query ID = root_20220407143129_980934ee-56f8-4606-9a2c-2c38e6fbea76
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1649211406017_0031, Tracking URL = http://master:8088/proxy/application_1649211406017_0031/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0031
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-04-07 14:31:37,503 Stage-1 map = 0%, reduce = 0%
2022-04-07 14:31:42,655 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.73 sec
2022-04-07 14:31:48,780 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.08 sec
MapReduce Total cumulative CPU time: 2 seconds 80 msec
Ended Job = job_1649211406017_0031
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.08 sec HDFS Read: 13652 HDFS Write: 946 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 80 msec
OK
114 94 class1 department1 1 1 1 0.0 0.273 1 94
214 94 class1 department2 2 1 1 0.0 0.273 1 94
213 94 class1 department2 3 1 1 0.0 0.273 1 94
211 93 class1 department2 4 2 4 0.3 0.455 1 94
115 93 class1 department1 5 2 4 0.3 0.455 2 94
212 83 class1 department2 6 3 6 0.5 0.545 2 83
215 82 class1 department2 7 4 7 0.6 0.636 2 83
112 80 class1 department1 8 5 8 0.7 0.727 2 83
113 74 class1 department1 9 6 9 0.8 0.909 3 74
216 74 class1 department2 10 6 9 0.8 0.909 3 74
111 69 class1 department1 11 7 11 1.0 1.0 3 69
221 99 class2 department2 1 1 1 0.0 0.111 1 99
122 86 class2 department1 2 2 2 0.125 0.222 1 86
225 85 class2 department2 3 3 3 0.25 0.333 1 86
224 80 class2 department2 4 4 4 0.375 0.444 2 80
123 78 class2 department1 5 5 5 0.5 0.667 2 80
222 78 class2 department2 6 5 5 0.5 0.667 2 80
121 74 class2 department1 7 6 7 0.75 0.889 3 74
223 74 class2 department2 8 6 7 0.75 0.889 3 74
124 70 class2 department1 9 7 9 1.0 1.0 3 70
Time taken: 20.529 seconds, Fetched: 20 row(s)
SELECT id
,score
,clazz
,SUM(score) OVER w as sum_w
,round(avg(score) OVER w,3) as avg_w
,count(score) OVER w as cnt_w
FROM new_score
WINDOW w AS (PARTITION BY clazz ORDER BY score rows between 2 PRECEDING and 2 FOLLOWING);
hive> select id
> ,score
> ,clazz
> ,department
> ,row_number() over w as rn_rk
> ,dense_rank() over w as dense_rk
> ,rank() over w as rk
> ,percent_rank() over w as percent_rk
> ,round(cume_dist() over w,3) as cume_rk
> ,NTILE(3) over w as ntile_num
> ,max(score) over (partition by clazz order by score desc range between 3 PRECEDING and 11 FOLLOWING) as max_p
> from new_score
> WINDOW w as (partition by clazz order by score desc);
Query ID = root_20220407144020_7058202b-1e77-467a-99fc-1a068420c361
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1649211406017_0032, Tracking URL = http://master:8088/proxy/application_1649211406017_0032/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0032
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-04-07 14:40:28,428 Stage-1 map = 0%, reduce = 0%
2022-04-07 14:40:33,542 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.61 sec
2022-04-07 14:40:39,692 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.97 sec
MapReduce Total cumulative CPU time: 1 seconds 970 msec
Ended Job = job_1649211406017_0032
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 1.97 sec HDFS Read: 12084 HDFS Write: 946 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 970 msec
OK
114 94 class1 department1 1 1 1 0.0 0.273 1 94
214 94 class1 department2 2 1 1 0.0 0.273 1 94
213 94 class1 department2 3 1 1 0.0 0.273 1 94
211 93 class1 department2 4 2 4 0.3 0.455 1 94
115 93 class1 department1 5 2 4 0.3 0.455 2 94
212 83 class1 department2 6 3 6 0.5 0.545 2 83
215 82 class1 department2 7 4 7 0.6 0.636 2 83
112 80 class1 department1 8 5 8 0.7 0.727 2 83
113 74 class1 department1 9 6 9 0.8 0.909 3 74
216 74 class1 department2 10 6 9 0.8 0.909 3 74
111 69 class1 department1 11 7 11 1.0 1.0 3 69
221 99 class2 department2 1 1 1 0.0 0.111 1 99
122 86 class2 department1 2 2 2 0.125 0.222 1 86
225 85 class2 department2 3 3 3 0.25 0.333 1 86
224 80 class2 department2 4 4 4 0.375 0.444 2 80
123 78 class2 department1 5 5 5 0.5 0.667 2 80
222 78 class2 department2 6 5 5 0.5 0.667 2 80
121 74 class2 department1 7 6 7 0.75 0.889 3 74
223 74 class2 department2 8 6 7 0.75 0.889 3 74
124 70 class2 department1 9 7 9 1.0 1.0 3 70
Time taken: 21.43 seconds, Fetched: 20 row(s)
hive> SELECT id
> ,score
> ,clazz
> ,SUM(score) OVER w as sum_w
> ,round(avg(score) OVER w,3) as avg_w
> ,count(score) OVER w as cnt_w
> FROM new_score
> WINDOW w AS (PARTITION BY clazz ORDER BY score rows between 2 PRECEDING and 2 FOLLOWING);
Query ID = root_20220407144114_45efcd00-44fa-4b5a-ae26-39c70d7e7ae4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1649211406017_0033, Tracking URL = http://master:8088/proxy/application_1649211406017_0033/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0033
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-04-07 14:41:21,539 Stage-1 map = 0%, reduce = 0%
2022-04-07 14:41:26,653 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.77 sec
2022-04-07 14:41:32,855 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.07 sec
MapReduce Total cumulative CPU time: 2 seconds 70 msec
Ended Job = job_1649211406017_0033
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.07 sec HDFS Read: 9895 HDFS Write: 506 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 70 msec
OK
111 69 class1 217 72.333 3
113 74 class1 297 74.25 4
216 74 class1 379 75.8 5
112 80 class1 393 78.6 5
215 82 class1 412 82.4 5
212 83 class1 431 86.2 5
211 93 class1 445 89.0 5
115 93 class1 457 91.4 5
213 94 class1 468 93.6 5
114 94 class1 375 93.75 4
214 94 class1 282 94.0 3
124 70 class2 218 72.667 3
121 74 class2 296 74.0 4
223 74 class2 374 74.8 5
222 78 class2 384 76.8 5
123 78 class2 395 79.0 5
224 80 class2 407 81.4 5
225 85 class2 428 85.6 5
122 86 class2 350 87.5 4
221 99 class2 270 90.0 3
Time taken: 20.661 seconds, Fetched: 20 row(s)
LAG(col,n):往前第n行数据
LEAD(col,n):往后第n行数据
FIRST_VALUE:取分组内排序后,截止到当前行,第一个值
LAST_VALUE:取分组内排序后,截止到当前行,最后一个值,对于并列的排名,取最后一
hive> select id
> ,score
> ,clazz
> ,department
> ,lag(id,2) over (partition by clazz order by score desc) as lag_num
> ,LEAD(id,2) over (partition by clazz order by score desc) as lead_num
> ,FIRST_VALUE(id) over (partition by clazz order by score desc) as first_v_num
> ,LAST_VALUE(id) over (partition by clazz order by score desc) as last_v_num
> ,NTILE(3) over (partition by clazz order by score desc) as ntile_num
> from new_score;
Query ID = root_20220407144805_f3dcdf6a-c4d4-49e0-9abc-5dbfb1edc129
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1649211406017_0034, Tracking URL = http://master:8088/proxy/application_1649211406017_0034/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0034
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-04-07 14:48:12,579 Stage-1 map = 0%, reduce = 0%
2022-04-07 14:48:17,775 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.6 sec
2022-04-07 14:48:23,978 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.62 sec
MapReduce Total cumulative CPU time: 1 seconds 620 msec
Ended Job = job_1649211406017_0034
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 1.62 sec HDFS Read: 12286 HDFS Write: 872 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 620 msec
OK
114 94 class1 department1 NULL 213 114 213 1
214 94 class1 department2 NULL 211 114 213 1
213 94 class1 department2 114 115 114 213 1
211 93 class1 department2 214 212 114 115 1
115 93 class1 department1 213 215 114 115 2
212 83 class1 department2 211 112 114 212 2
215 82 class1 department2 115 113 114 215 2
112 80 class1 department1 212 216 114 112 2
113 74 class1 department1 215 111 114 216 3
216 74 class1 department2 112 NULL 114 216 3
111 69 class1 department1 113 NULL 114 111 3
221 99 class2 department2 NULL 225 221 221 1
122 86 class2 department1 NULL 224 221 122 1
225 85 class2 department2 221 123 221 225 1
224 80 class2 department2 122 222 221 224 2
123 78 class2 department1 225 121 221 222 2
222 78 class2 department2 224 223 221 222 2
121 74 class2 department1 123 124 221 223 3
223 74 class2 department2 222 NULL 221 223 3
124 70 class2 department1 121 NULL 221 124 3
Time taken: 19.345 seconds, Fetched: 20 row(s)
查询每个班级总分排名前三的学生(输出:学号,姓名,班级,总分)
hive>
> select tt1.id
> ,tt1.name
> ,tt1.clazz
> from(
> select t1.id
> ,t1.name
> ,t1.clazz
> ,t2.sum_score
> ,row_number() over(partition by clazz order by t2.sum_score desc)as rn
> from students t1
> left join(
> select id
> ,sum(score) as sum_score
> from score
> group by id
> )t2 on t1.id=t2.id
> )tt1 where tt1.rn<=3;
Query ID = root_20220407150222_5b8ca666-3ea8-4d00-b5a8-97a4310b0541
Total jobs = 4
Launching Job 1 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1649211406017_0038, Tracking URL = http://master:8088/proxy/application_1649211406017_0038/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0038
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-04-07 15:02:30,453 Stage-1 map = 0%, reduce = 0%
2022-04-07 15:02:35,572 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
2022-04-07 15:02:41,819 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.01 sec
MapReduce Total cumulative CPU time: 2 seconds 10 msec
Ended Job = job_1649211406017_0038
Stage-7 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Execution log at: /tmp/root/root_20220407150222_5b8ca666-3ea8-4d00-b5a8-97a4310b0541.log
2022-04-07 15:02:45 Starting to launch local task to process map join; maximum memory = 477626368
2022-04-07 15:02:46 Dump the side-table for tag: 1 with group count: 1000 into file: file:/usr/local/soft/hive-1.2.1/tmp/e73989c3-35b9-4392-9752-72275f2987a9/hive_2022-04-07_15-02-22_341_4056382286241342530-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile11--.hashtable
2022-04-07 15:02:46 Uploaded 1 File to: file:/usr/local/soft/hive-1.2.1/tmp/e73989c3-35b9-4392-9752-72275f2987a9/hive_2022-04-07_15-02-22_341_4056382286241342530-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile11--.hashtable (26286 bytes)
2022-04-07 15:02:46 End of local task; Time Taken: 0.542 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 4
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649211406017_0039, Tracking URL = http://master:8088/proxy/application_1649211406017_0039/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0039
Hadoop job information for Stage-5: number of mappers: 1; number of reducers: 0
2022-04-07 15:02:55,174 Stage-5 map = 0%, reduce = 0%
2022-04-07 15:03:00,466 Stage-5 map = 100%, reduce = 0%, Cumulative CPU 0.91 sec
MapReduce Total cumulative CPU time: 910 msec
Ended Job = job_1649211406017_0039
Launching Job 4 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1649211406017_0040, Tracking URL = http://master:8088/proxy/application_1649211406017_0040/
Kill Command = /usr/local/soft/hadoop-2.7.6/bin/hadoop job -kill job_1649211406017_0040
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2022-04-07 15:03:12,054 Stage-3 map = 0%, reduce = 0%
2022-04-07 15:03:17,154 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 0.74 sec
2022-04-07 15:03:24,394 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 2.12 sec
MapReduce Total cumulative CPU time: 2 seconds 120 msec
Ended Job = job_1649211406017_0040
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.01 sec HDFS Read: 145271 HDFS Write: 25234 SUCCESS
Stage-Stage-5: Map: 1 Cumulative CPU: 0.91 sec HDFS Read: 47423 HDFS Write: 48454 SUCCESS
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 2.12 sec HDFS Read: 56369 HDFS Write: 1980 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 40 msec
OK
1500100308 黄初夏 文科一班
1500100875 马向南 文科一班
1500100943 许昌黎 文科一班
1500100160 云冰真 文科三班
1500100572 臧忆香 文科三班
1500100434 黎雨珍 文科三班
1500100418 蓟海昌 文科二班
1500100823 宓新曦 文科二班
1500100954 咸芷天 文科二班
1500100930 闻运凯 文科五班
1500100949 颜沛槐 文科五班
1500100904 阎元蝶 文科五班
1500100136 黎昆鹏 文科六班
1500100900 查思菱 文科六班
1500100716 丰冷霜 文科六班
1500100873 路鸿志 文科四班
1500100258 湛昌勋 文科四班
1500100116 文元蝶 文科四班
1500100235 沈香巧 理科一班
1500100773 傅元蝶 理科一班
1500100925 卞乐萱 理科一班
1500100929 满慕易 理科三班
1500100184 夔寻巧 理科三班
1500100598 宰金鹏 理科三班
1500100834 谷念薇 理科二班
1500100104 咸冰蝶 理科二班
1500100762 聂德明 理科二班
1500100080 巫景彰 理科五班
1500100547 廖向南 理科五班
1500100839 明雁桃 理科五班
1500100596 田晨潍 理科六班
1500100903 於依云 理科六班
1500100563 禄昆鹏 理科六班
1500100635 蓬怀绿 理科四班
1500100590 逄中震 理科四班
1500100939 耿智杰 理科四班
Time taken: 63.093 seconds, Fetched: 36 row(s)