在hive中,想要实现分组内排序,一般都是自己写udf实现oracle中分析函数row_number() over(partition)的功能,如果不使用自定义udf,仅使用标准sql实现的话,毫无性能可言,仅做实验而已。
方便起见,以下语句为oracle中语句,但都属于标准sql,在hive中亦可:
CREATE TABLE lxw_t (user_id VARCHAR2(20),
class VARCHAR2(20),
score NUMBER
);
INSERT INTO liuxiaowen.lxw_t VALUES ('user_1','语文','90');
INSERT INTO liuxiaowen.lxw_t VALUES ('user_1','数学','86');
INSERT INTO liuxiaowen.lxw_t VALUES ('user_1','英语','96');
INSERT INTO liuxiaowen.lxw_t VALUES ('user_1','体育','77');
INSERT INTO liuxiaowen.lxw_t VALUES ('user_2','语文','88');
INSERT INTO liuxiaowen.lxw_t VALUES ('user_2','数学','65');
INSERT INTO liuxiaowen.lxw_t VALUES ('user_2','英语','67');
INSERT INTO liuxiaowen.lxw_t VALUES ('user_2','体育','98');
commit;
SELECT x.user_id,x.class,x.score,COUNT(1) AS seq
FROM (
SELECT a.*,b.score score2 FROM
liuxiaowen.lxw_t a ,
liuxiaowen.lxw_t b
WHERE a.user_id = b.user_id (+)
) x
WHERE x.score2 >= x.score --按照成绩由高到低排列
--WHERE x.score >= x.score2 --按照成绩由低到高排列
GROUP BY x.user_id,x.class,x.score
order BY 1,4
按照成绩由高到低排列的结果如下:
USER_IDCLASSSCORESEQ
user_1英语961
user_1语文902
user_1数学863
user_1体育774
user_2体育981
user_2语文882
user_2英语673
user_2数学654
按照成绩由低到高排列的结果如下:
USER_IDCLASSSCORESEQ
user_1体育771
user_1数学862
user_1语文903
user_1英语964
user_2数学651
user_2英语672
user_2语文883
user_2体育984