Hive DML语法

最新推荐文章于 2024-04-24 13:36:49 发布

逗点儿

最新推荐文章于 2024-04-24 13:36:49 发布

阅读量756

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/weixin_39216383/article/details/79014539

版权

hive 专栏收录该内容

13 篇文章 1 订阅

订阅专栏

加载文件到表
- LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]

hive> load data local inpath "/home/hadoop/data/deptn.sql" overwrite into table dept;
Loading data to table default.dept
Table default.dept stats: [numFiles=1, numRows=0, totalSize=80, rawDataSize=0]
OK
Time taken: 2.401 seconds

查询结果写入文件系统（导出数据）
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format]
SELECT … FROM …

insert overwrite local directory '/home/hadoop/data/outputemp2' 
row format delimited fields terminated by "\t"
select * from emp;

FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] …//导出到多张表

from emp
INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select empno, ename  
INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select ename;

利用hive -e 导出数据

查询语句
每个部门的平均工资
select deptno, avg(salary) from emp group by deptno;

select ename, deptno, avg(salary) from emp group by deptno;
Expression not in GROUP BY key ‘ename’//注意对应关系

出现在select中的字段，要么出现在group by中，要么是聚合函数

每个部门、岗位的最高工资
select deptno, job, max(salary) from emp group by deptno, job;

每个部门的平均工资大于2000
select deptno, avg(salary) avg_sal from emp group by deptno having avg_sal>2000;

对查询结果简单分类
select ename,sal,
case
when sal>1 and sal<=1000 then ‘lower’
when sal>1000 and sal <=2000 then ‘moddle’
when sal>2000 and sal <=4000 then ‘high’
else ‘highest’
end
from emp;

hive> select ename,sal,
    > case
    > when sal>1 and sal<=1000 then 'lower'
    > when sal>1000 and sal <=2000 then 'moddle'
    > when sal>2000 and sal <=4000 then 'high'
    > else 'highest'
    > from emp;
FAILED: ParseException line 7:0 missing KW_END at 'from' near '<EOF>'
hive> select ename,sal,
    > case
    > when sal>1 and sal<=1000 then 'lower'
    > when sal>1000 and sal <=2000 then 'moddle'
    > when sal>2000 and sal <=4000 then 'high'
    > else 'highest'
    > end
    > from emp;
OK
SMITH   800.0   lower
ALLEN   1600.0  moddle
WARD    1250.0  moddle
JONES   2975.0  high
MARTIN  1250.0  moddle
BLAKE   2850.0  high
CLARK   2450.0  high
SCOTT   3000.0  high
KING    5000.0  highest
TURNER  1500.0  moddle
ADAMS   1100.0  moddle
JAMES   950.0   lower
FORD    3000.0  high
MILLER  1300.0  moddle
Time taken: 2.191 seconds, Fetched: 14 row(s)

将查询结果合并
select count(1) from emp where empno=7566
union all
select count(1) from emp where empno=7654;

hive> select count(1) from emp where empno=7566
    > union all
    > select count(1) from emp where empno=7654;
Query ID = hadoop_20180109015050_8d760d00-2c6e-4cc4-a99e-5573e64bfc9b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1515472546059_0002, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-01-09 01:52:52,767 Stage-1 map = 0%,  reduce = 0%
2018-01-09 01:53:15,692 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.93 sec
2018-01-09 01:53:33,559 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.71 sec
MapReduce Total cumulative CPU time: 8 seconds 710 msec
Ended Job = job_1515472546059_0002
Launching Job 2 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1515472546059_0003, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0003/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2018-01-09 01:53:52,732 Stage-3 map = 0%,  reduce = 0%
2018-01-09 01:54:15,733 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 7.23 sec
2018-01-09 01:54:37,718 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 9.75 sec
MapReduce Total cumulative CPU time: 9 seconds 750 msec
Ended Job = job_1515472546059_0003
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1515472546059_0004, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0004/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0004
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 0
2018-01-09 01:54:56,536 Stage-2 map = 0%,  reduce = 0%
2018-01-09 01:55:21,218 Stage-2 map = 50%,  reduce = 0%, Cumulative CPU 2.49 sec
2018-01-09 01:55:22,329 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
MapReduce Total cumulative CPU time: 4 seconds 810 msec
Ended Job = job_1515472546059_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.71 sec   HDFS Read: 7881 HDFS Write: 114 SUCCESS
Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 9.75 sec   HDFS Read: 7886 HDFS Write: 114 SUCCESS
Stage-Stage-2: Map: 2   Cumulative CPU: 4.81 sec   HDFS Read: 5348 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 270 msec
OK
1
1
Time taken: 179.507 seconds, Fetched: 2 row(s)