Hive中开窗函数的案例

最新推荐文章于 2022-02-17 12:24:28 发布

丿沐染烟忱丶

最新推荐文章于 2022-02-17 12:24:28 发布

阅读量201

点赞数

分类专栏： Hive 文章标签： hive sql 数据分析

本文链接：https://blog.csdn.net/weixin_45607513/article/details/103961907

版权

Hive 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

数据信息

0: jdbc:hive2://Hbase:10000> select empno,ename,sal,deptno from t_employee;
+--------+---------+-------+---------+--+
| empno  |  ename  |  sal  | deptno  |
+--------+---------+-------+---------+--+
| 7369   | SMITH   | 800   | 20      |
| 7499   | ALLEN   | 1600  | 30      |
| 7521   | WARD    | 1250  | 30      |
| 7566   | JONES   | 2975  | 20      |
| 7654   | MARTIN  | 1250  | 30      |
| 7698   | BLAKE   | 2850  | 30      |
| 7782   | CLARK   | 2450  | 10      |
| 7788   | SCOTT   | 1500  | 20      |
| 7839   | KING    | 5000  | 10      |
| 7844   | TURNER  | 1500  | 30      |
| 7876   | ADAMS   | 1100  | 20      |
| 7900   | JAMES   | 950   | 30      |
| 7902   | FORD    | 3000  | 20      |
| 7934   | MILLER  | 1300  | 10      |
+--------+---------+-------+---------+--+

练习

1.查看每个部门员工的薪资排名:
语句：
select empno,ename,sal,rank() over(partition by deptno order by sal desc) from t_employee;
分析：rank() over(partition by deptno order by sal desc)：查看每个部门内员工的薪资，并进行排序。partition by后的字段是哪个，就表示查看哪个信息，并对此字段的对应信息进行操作。
结果：

0: jdbc:hive2://Hbase:10000> select empno,ename,deptno,sal,rank() over(partition by deptno order by sal desc) from t_employee;

+--------+---------+---------+-------+----------------+--+
| empno  |  ename  | deptno  |  sal  | rank_window_0  |
+--------+---------+---------+-------+----------------+--+
| 7839   | KING    | 10      | 5000  | 1              |
| 7782   | CLARK   | 10      | 2450  | 2              |
| 7934   | MILLER  | 10      | 1300  | 3              |
| 7902   | FORD    | 20      | 3000  | 1              |
| 7566   | JONES   | 20      | 2975  | 2              |
| 7788   | SCOTT   | 20      | 1500  | 3              |
| 7876   | ADAMS   | 20      | 1100  | 4              |
| 7369   | SMITH   | 20      | 800   | 5              |
| 7698   | BLAKE   | 30      | 2850  | 1              |
| 7499   | ALLEN   | 30      | 1600  | 2              |
| 7844   | TURNER  | 30      | 1500  | 3              |
| 7654   | MARTIN  | 30      | 1250  | 4              |
| 7521   | WARD    | 30      | 1250  | 4              |
| 7900   | JAMES   | 30      | 950   | 6              |
+--------+---------+---------+-------+----------------+--+

2.查询员工信息，展示该员工所在部门薪资排名？
此题和第一题的结果差不多，只是第1题是有序的，并且薪资相同的会进行并列排名。
而第2题只是查询每个员工在该部门中薪资排第几。
语句：

select e1.empno,e1.ename,e1.sal,e1.deptno,
sum(1) over (partition by e1.deptno order by e1.sal desc rows between unbounded preceding and current row )
from t_employee e1 order by e1.deptno;

               无限的     前面           无限的      后面
rows between  unbounded preceding and unbounded following  
               无限的                    当前行
rows between  unbounded preceding and current row 
              当前行             无限的   后续的
rows between  current row  and unbounded following  
               前1个           后一个
rows between  -1 preceding and 1 following  
               前1个           后一个
rows between  -1 preceding and 1 following

分析：该语句的思想是，对当前员工所在部门进行薪资排序，并且查看这一行之前的数据为几条，也就是当前员工在该部门的薪资排行。查询的结果如果有相同的薪资，也会进行排序，默认为字典排序。

结果：

+-----------+-----------+---------+------------+---------------+--+
| e1.empno  | e1.ename  | e1.sal  | e1.deptno  | sum_window_0  |
+-----------+-----------+---------+------------+---------------+--+
| 7839      | KING      | 5000    | 10         | 1             |
| 7782      | CLARK     | 2450    | 10         | 2             |
| 7934      | MILLER    | 1300    | 10         | 3             |
| 7876      | ADAMS     | 1100    | 20         | 4             |
| 7369      | SMITH     | 800     | 20         | 5             |
| 7788      | SCOTT     | 1500    | 20         | 3             |
| 7566      | JONES     | 2975    | 20         | 2             |
| 7902      | FORD      | 3000    | 20         | 1             |
| 7654      | MARTIN    | 1250    | 30         | 4             |
| 7521      | WARD      | 1250    | 30         | 5             |
| 7499      | ALLEN     | 1600    | 30         | 2             |
| 7698      | BLAKE     | 2850    | 30         | 1             |
| 7900      | JAMES     | 950     | 30         | 6             |
| 7844      | TURNER    | 1500    | 30         | 3             |
+-----------+-----------+---------+------------+---------------+--+

3.查询员工信息，展示该员工所在部门的平均薪资
语句：

select e.empno,e.ename,e.deptno,avg(e.sal) over(partition by e.deptno) from t_employee e;

分析：该语句是查询，每个员工的信息和他所在部门的平均薪资。
开窗函数：avg(e.sal) over(partition by e.deptno) 查看部门的平均薪资。

结果：

+----------+----------+-----------+---------------+--+
| e.empno  | e.ename  | e.deptno  | avg_window_0  |
+----------+----------+-----------+---------------+--+
| 7934     | MILLER   | 10        | 2916.666667   |
| 7839     | KING     | 10        | 2916.666667   |
| 7782     | CLARK    | 10        | 2916.666667   |
| 7876     | ADAMS    | 20        | 1875          |
| 7788     | SCOTT    | 20        | 1875          |
| 7369     | SMITH    | 20        | 1875          |
| 7566     | JONES    | 20        | 1875          |
| 7902     | FORD     | 20        | 1875          |
| 7844     | TURNER   | 30        | 1566.666667   |
| 7499     | ALLEN    | 30        | 1566.666667   |
| 7698     | BLAKE    | 30        | 1566.666667   |
| 7654     | MARTIN   | 30        | 1566.666667   |
| 7521     | WARD     | 30        | 1566.666667   |
| 7900     | JAMES    | 30        | 1566.666667   |
+----------+----------+-----------+---------------+--+