1.topN案例
1.1 统计每种性别中年龄最大的两条数据
创建表并导入数据到HIVE
hive (ruozedata_d7)> create table hive_rolenumber(
> id int,
> age int,
> name string,
> gender string
> )
> row format delimited fields terminated by ',';
hive (ruozedata_d7)> load data local inpath '/home/hadoop/hive_row_number.txt' overwrite into table hive_rolenumber;
hive (ruozedata_d7)> select * from hive_rolenumber;
OK
hive_rolenumber.id hive_rolenumber.age hive_rolenumber.name hive_rolenumber.gender
1 18 zhangsan M
2 19 lisi M
3 22 wangwu F
4 16 zhaoliu F
5 30 tianqi M
6 26 wangba F
使用窗口函数 RANK ROW_NUMBER DENSE_RANK CUME_DIST PERCENT_RANK NTILE
语法:COUNT(DISTINCT a) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
hive (ruozedata_d7)> select id,age,name,gender,
> ROW_NUMBER() over(PARTITION BY gender order by age desc) as r
> from hive_rolenumber;
Query ID = hadoop_20190722113535_42da2976-7960-419e-a7b9-7cf3faeae12b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1563262646611_0024, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0024/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_15632