hive查询where join_Hive的查询操作(group by , join, 多表连接)

Hive的查询操作(group by , join, 多表连接)

发布时间:2018-12-02 18:29,

浏览次数:1315

, 标签:

Hive

group

by

join

目录

数据准备:

分组实操练习:

Join连接实操练习:

多表的连接实操练习:

数据准备:

1,创建emp表和dept表,并向其中导入如下的数据: create database db_select; use db_select; create

table if not exists dept( deptno int, dname string, loc int ) row format

delimited fields terminated by '\t'; create table if not exists emp( empno int,

ename string, job string, mgr int, hiredata string, sal double, comm double,

deptno int ) row format delimited fields delimited by '\t'; 0:

jdbc:hive2://hadoop108:10000> show tables; +-----------+--+ | tab_name |

+-----------+--+ | dept | | emp | | location | +-----------+--+ [isea@hadoop108

datas]$ vim dept.txt 10 ACCOUNTING 1700 20 RESEARCH 1800 30 SALES 1900 40

OPERATIONS 1700 [isea@hadoop108 datas]$ vim emp.txt 7369 SMITH CLERK 7902

1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521

WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2

2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE

MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10

7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20 7839 KING PRESIDENT 1981-11-17

5000.00 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30 7876 ADAMS CLERK

7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD

ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10

[isea@hadoop108 datas]$ vim location.txt 1700 Beijing 1800 London 1900 Tokyo

load data local inpath '/opt/module/datas/dept.txt' into table dept; load data

local inpath '/opt/module/datas/emp.txt' into table emp; load data local inpath

'/opt/module/datas/location.txt' into table location;

分组实操练习:

1,计算每个部门的平均工资: select deptno,avg(sal) from emp group by deptno;

+---------+---------------------+--+ | deptno | _c1 |

+---------+---------------------+--+ | 10 | 2916.6666666666665 | | 20 | 2175.0

| | 30 | 1566.6666666666667 | +---------+---------------------+--+

2,计算emp每个部门的最高薪水: select deptno,max(sal) from emp group by deptno;

+---------+---------+--+ | deptno | _c1 | +---------+---------+--+ | 10 |

5000.0 | | 20 | 3000.0 | | 30 | 2850.0 | +---------+---------+--+

3,求每个部门的平均薪水大于2000的部门: select deptno,avg(sal) AVG from emp group by deptno

having AVG > 2000; +---------+---------------------+--+ | deptno | avg |

+---------+---------------------+--+ | 10 | 2916.6666666666665 | | 20 | 2175.0

| +---------+---------------------+--+ 4,根据员工表和部门表中的部门编号相等,查询员工编号、员工名称和部门名称;

select empno,ename,dname from emp left join dept on emp.deptno = dept.deptno;

+--------+---------+-------------+--+ | empno | ename | dname |

+--------+---------+-------------+--+ | 7369 | SMITH | RESEARCH | | 7499 |

ALLEN | SALES | | 7521 | WARD | SALES | | 7566 | JONES | RESEARCH | | 7654 |

MARTIN | SALES | | 7698 | BLAKE | SALES | | 7782 | CLARK | ACCOUNTING | | 7788

| SCOTT | RESEARCH | | 7839 | KING | ACCOUNTING | | 7844 | TURNER | SALES | |

7876 | ADAMS | RESEARCH | | 7900 | JAMES | SALES | | 7902 | FORD | RESEARCH | |

7934 | MILLER | ACCOUNTING | +--------+---------+-------------+--+

Join连接实操练习:

合并员工表和部门表:这里使用到了内连接 select e.ename,e.empno,e.sal,e.deptno,d.dname from emp e

join dept d on e.deptno = d.deptno;

+----------+----------+---------+-----------+-------------+--+ | e.ename |

e.empno | e.sal | e.deptno | d.dname |

+----------+----------+---------+-----------+-------------+--+ | SMITH | 7369 |

800.0 | 20 | RESEARCH | | ALLEN | 7499 | 1600.0 | 30 | SALES | | WARD | 7521 |

1250.0 | 30 | SALES | | JONES | 7566 | 2975.0 | 20 | RESEARCH | | MARTIN | 7654

| 1250.0 | 30 | SALES | | BLAKE | 7698 | 2850.0 | 30 | SALES | | CLARK | 7782 |

2450.0 | 10 | ACCOUNTING | | SCOTT | 7788 | 3000.0 | 20 | RESEARCH | | KING |

7839 | 5000.0 | 10 | ACCOUNTING | | TURNER | 7844 | 1500.0 | 30 | SALES | |

ADAMS | 7876 | 1100.0 | 20 | RESEARCH | | JAMES | 7900 | 950.0 | 30 | SALES | |

FORD | 7902 | 3000.0 | 20 | RESEARCH | | MILLER | 7934 | 1300.0 | 10 |

ACCOUNTING | +----------+----------+---------+-----------+-------------+--+

这里如果没一个员工刚来,暂时没有分配部门,将不会显示在查询结果中 左外连接:左表的所有记录都会被保留下来,没有匹配到的,以null显示: select

e.empno,e.ename,e.sal,d.dname from emp e left join dept d where e.deptno =

d.deptno; +----------+----------+---------+-------------+--+ | e.empno |

e.ename | e.sal | d.dname | +----------+----------+---------+-------------+--+

| 7369 | SMITH | 800.0 | RESEARCH | | 7499 | ALLEN | 1600.0 | SALES | | 7521 |

WARD | 1250.0 | SALES | | 7566 | JONES | 2975.0 | RESEARCH | | 7654 | MARTIN |

1250.0 | SALES | | 7698 | BLAKE | 2850.0 | SALES | | 7782 | CLARK | 2450.0 |

ACCOUNTING | | 7788 | SCOTT | 3000.0 | RESEARCH | | 7839 | KING | 5000.0 |

ACCOUNTING | | 7844 | TURNER | 1500.0 | SALES | | 7876 | ADAMS | 1100.0 |

RESEARCH | | 7900 | JAMES | 950.0 | SALES | | 7902 | FORD | 3000.0 | RESEARCH |

| 7934 | MILLER | 1300.0 | ACCOUNTING |

+----------+----------+---------+-------------+--+

这里如果没一个员工刚来,暂时没有分配部门,也会显示在查询结果中 右外连接:右表的所有记录都会被保留下来,没有匹配到的,以null显示: select

e.empno,e.ename,e.sal,d.dname from emp e right join dept d where e.deptno =

d.deptno; +----------+----------+---------+-------------+--+ | e.empno |

e.ename | e.sal | d.dname | +----------+----------+---------+-------------+--+

| 7782 | CLARK | 2450.0 | ACCOUNTING | | 7839 | KING | 5000.0 | ACCOUNTING | |

7934 | MILLER | 1300.0 | ACCOUNTING | | 7369 | SMITH | 800.0 | RESEARCH | |

7566 | JONES | 2975.0 | RESEARCH | | 7788 | SCOTT | 3000.0 | RESEARCH | | 7876

| ADAMS | 1100.0 | RESEARCH | | 7902 | FORD | 3000.0 | RESEARCH | | 7499 |

ALLEN | 1600.0 | SALES | | 7521 | WARD | 1250.0 | SALES | | 7654 | MARTIN |

1250.0 | SALES | | 7698 | BLAKE | 2850.0 | SALES | | 7844 | TURNER | 1500.0 |

SALES | | 7900 | JAMES | 950.0 | SALES |

+----------+----------+---------+-------------+--+

这里如果有一个刚陈成立的部门,暂时没有员工被调到该部门,那么这个部门仍然会显示在这个部门中 满外连接:左右表中的,没有匹配到的,都会以null显示

select e.empno,e.ename,e.sal,d.dname from emp e full join dept d where e.deptno

= d.deptno; +----------+----------+---------+-------------+--+ | e.empno |

e.ename | e.sal | d.dname | +----------+----------+---------+-------------+--+

| 7934 | MILLER | 1300.0 | ACCOUNTING | | 7839 | KING | 5000.0 | ACCOUNTING | |

7782 | CLARK | 2450.0 | ACCOUNTING | | 7876 | ADAMS | 1100.0 | RESEARCH | |

7788 | SCOTT | 3000.0 | RESEARCH | | 7369 | SMITH | 800.0 | RESEARCH | | 7566 |

JONES | 2975.0 | RESEARCH | | 7902 | FORD | 3000.0 | RESEARCH | | 7844 | TURNER

| 1500.0 | SALES | | 7499 | ALLEN | 1600.0 | SALES | | 7698 | BLAKE | 2850.0 |

SALES | | 7654 | MARTIN | 1250.0 | SALES | | 7521 | WARD | 1250.0 | SALES | |

7900 | JAMES | 950.0 | SALES |

+----------+----------+---------+-------------+--+ 没有部门的员工和没有员工的部门都会显示在这张表中

多表的连接实操练习:

1,多表的连接查询: select e.ename,d.dname,l.loc_name from emp e left join dept d on

e.deptno = d.deptno left join location l on d.loc = l.loc;

+----------+-------------+-------------+--+ | e.ename | d.dname | l.loc_name |

+----------+-------------+-------------+--+ | SMITH | RESEARCH | London | |

ALLEN | SALES | Tokyo | | WARD | SALES | Tokyo | | JONES | RESEARCH | London |

| MARTIN | SALES | Tokyo | | BLAKE | SALES | Tokyo | | CLARK | ACCOUNTING |

Beijing | | SCOTT | RESEARCH | London | | KING | ACCOUNTING | Beijing | |

TURNER | SALES | Tokyo | | ADAMS | RESEARCH | London | | JAMES | SALES | Tokyo

| | FORD | RESEARCH | London | | MILLER | ACCOUNTING | Beijing |

+----------+-------------+-------------+--+

大多数情况下,Hive会对每对JOIN连接对象启动一个MapReduce任务。本例中会首先启动一个MapReduce job对

表e和表d进行连接操作,然后会再启动一个MapReduce job将第一个MapReduce job的输出和表l;进行连接操作

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值