实例需求
有一张职员信息表test(id,姓名,部门,入职时间),要求:查询出各部门最早入职的职员信息,表数据如下图:
Hive实现
语法:row_number() over (partition BY 字段A order by 字段B desc)
partition by:类似hive的建表,分区的意思;
order by :排序,默认是升序,加desc降序;
这里按字段A分区,对字段B进行降序排序
引用上面语法(按部门分组,按时间正序排序)查询row_num,
查询语句:
SELECT *,row_number() over (partition BY dept ORDER BY time ASC) AS row_num FROM test;
查询结果:
最后取row_num为1的数据就是想要的结果
完整查询语句:
SELECT name,dept,time FROM (
SELECT *,row_number() over (partition BY dept ORDER BY time ASC) AS row_num FROM test ) AS test
WHERE row_num<=1
最后结果:
MySQL实现
我看到一些博文说mysql同样支持(partition BY 字段A order by 字段B desc),但我测试结果是:
[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(partition BY dept ORDER BY time ASC) AS row_num FROM test’ at line 1
下面我说一说我是怎么用mysql实现上面的需求的
- 实现row_num
查询语句:
set @rownum=0;
SELECT *,@rownum:=@rownum+1 as row_num FROM test ORDER BY time;
查询结果: - 按部门分组并取row_num最小的一条记录
为了看起来比较清晰,我们将上述的查询结果存入表test_row,即
完整查询语句:
set @rownum=0;
CREATE TABLE test_row AS SELECT *,@rownum:=@rownum+1 as row_num FROM test ORDER BY time;
SELECT name,dept,time FROM test_row WHERE row_num in (SELECT min(row_num) FROM test_row GROUP BY dept);
查询结果: