大数据之hive（数据仓库工具）的分组和分区操作

独创之上

已于 2022-03-05 15:54:39 修改

阅读量7.4k

点赞数 1

分类专栏：大数据文章标签： hive 数据仓库 big data

于 2022-02-28 11:28:43 首次发布

本文链接：https://blog.csdn.net/a18379692263/article/details/123170951

版权

注：在对hive的概念、优缺点、安装部署和参数配置在之后再进行总结，本小节主要对hive中的分组和分区进行总结。

一、分组

group by通常和聚合函数一起使用，按照一个或者多个列进行分组，然后对每个组进行聚合操作。

例如：计算员工表中每个部门中每个岗位的最高工资

注：在查询过程中只能查询group by后的字段或者函数中的字段。

select t.deptno, t.job, max(t.sal) max_sal from emp t group by t.deptno, t.job;

在一般的分组函数后不能使用查询where语句，而having后面可以使用分组函数，having只用于group by分组统计语句。

select 字段 from 表名 where 条件 group by 字段

或者

select 字段 from 表名 group by 字段 having 过滤条件

注意：对于过滤条件，可以先用where，再用group by或者是先用group by，再用having

例如：求每个部门的平均薪水大于 2000 的部门

select deptno, avg(sal) avg_sal from emp group by deptno having avg_sal > 2000;

而不使用having语句的语句为：

select deptno, avg_sal from (select deptno,avg(sal) avg_sal from emp group by deptno)t1 where avg_sal > 2000;

join主要对两个表通过两个相同的字段进行连接，并查询相关的结果

例如：根据员工表和部门表中的部门编号相等，查询员工编号、员工名称和部门名称；

select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno=d.deptno;

内连接：只有进行连接的两个表中都存在与连接条件相匹配的数据才会被保留下来。

例如：查询两个表相同字段匹配的数据

 select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;

左外连接：JOIN 操作符左边表中符合 WHERE 子句的所有记录将会被返回。

select e.empno, e.ename, d.deptno from emp e left join dept d on e.deptno = d.deptno;

关注