Hive DML语法

  • 加载文件到表
    • LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]
hive> load data local inpath "/home/hadoop/data/deptn.sql" overwrite into table dept;
Loading data to table default.dept
Table default.dept stats: [numFiles=1, numRows=0, totalSize=80, rawDataSize=0]
OK
Time taken: 2.401 seconds
  • 查询结果写入文件系统(导出数据)
    INSERT OVERWRITE [LOCAL] DIRECTORY directory1
    [ROW FORMAT row_format] [STORED AS file_format]
    SELECT … FROM …
insert overwrite local directory '/home/hadoop/data/outputemp2' 
row format delimited fields terminated by "\t"
select * from emp; 

FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] …//导出到多张表

from emp
INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select empno, ename  
INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select ename;   

利用hive -e 导出数据

  • 查询语句
    每个部门的平均工资
    select deptno, avg(salary) from emp group by deptno;

select ename, deptno, avg(salary) from emp group by deptno;
Expression not in GROUP BY key ‘ename’//注意对应关系

出现在select中的字段,要么出现在group by中,要么是聚合函数

每个部门、岗位的最高工资
select deptno, job, max(salary) from emp group by deptno, job;

每个部门的平均工资大于2000
select deptno, avg(salary) avg_sal from emp group by deptno having avg_sal>2000;

对查询结果简单分类
select ename,sal,
case
when sal>1 and sal<=1000 then ‘lower’
when sal>1000 and sal <=2000 then ‘moddle’
when sal>2000 and sal <=4000 then ‘high’
else ‘highest’
end
from emp;

hive> select ename,sal,
    > case
    > when sal>1 and sal<=1000 then 'lower'
    > when sal>1000 and sal <=2000 then 'moddle'
    > when sal>2000 and sal <=4000 then 'high'
    > else 'highest'
    > from emp;
FAILED: ParseException line 7:0 missing KW_END at 'from' near '<EOF>'
hive> select ename,sal,
    > case
    > when sal>1 and sal<=1000 then 'lower'
    > when sal>1000 and sal <=2000 then 'moddle'
    > when sal>2000 and sal <=4000 then 'high'
    > else 'highest'
    > end
    > from emp;
OK
SMITH   800.0   lower
ALLEN   1600.0  moddle
WARD    1250.0  moddle
JONES   2975.0  high
MARTIN  1250.0  moddle
BLAKE   2850.0  high
CLARK   2450.0  high
SCOTT   3000.0  high
KING    5000.0  highest
TURNER  1500.0  moddle
ADAMS   1100.0  moddle
JAMES   950.0   lower
FORD    3000.0  high
MILLER  1300.0  moddle
Time taken: 2.191 seconds, Fetched: 14 row(s)

将查询结果合并
select count(1) from emp where empno=7566
union all
select count(1) from emp where empno=7654;

hive> select count(1) from emp where empno=7566
    > union all
    > select count(1) from emp where empno=7654;
Query ID = hadoop_20180109015050_8d760d00-2c6e-4cc4-a99e-5573e64bfc9b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1515472546059_0002, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-01-09 01:52:52,767 Stage-1 map = 0%,  reduce = 0%
2018-01-09 01:53:15,692 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.93 sec
2018-01-09 01:53:33,559 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.71 sec
MapReduce Total cumulative CPU time: 8 seconds 710 msec
Ended Job = job_1515472546059_0002
Launching Job 2 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1515472546059_0003, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0003/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2018-01-09 01:53:52,732 Stage-3 map = 0%,  reduce = 0%
2018-01-09 01:54:15,733 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 7.23 sec
2018-01-09 01:54:37,718 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 9.75 sec
MapReduce Total cumulative CPU time: 9 seconds 750 msec
Ended Job = job_1515472546059_0003
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1515472546059_0004, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0004/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0004
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 0
2018-01-09 01:54:56,536 Stage-2 map = 0%,  reduce = 0%
2018-01-09 01:55:21,218 Stage-2 map = 50%,  reduce = 0%, Cumulative CPU 2.49 sec
2018-01-09 01:55:22,329 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
MapReduce Total cumulative CPU time: 4 seconds 810 msec
Ended Job = job_1515472546059_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.71 sec   HDFS Read: 7881 HDFS Write: 114 SUCCESS
Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 9.75 sec   HDFS Read: 7886 HDFS Write: 114 SUCCESS
Stage-Stage-2: Map: 2   Cumulative CPU: 4.81 sec   HDFS Read: 5348 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 270 msec
OK
1
1
Time taken: 179.507 seconds, Fetched: 2 row(s)


import和export的使用
导入数据和导出数据的特点时将元数据也一同导出和导入,可以移植到不同的Hadoop上,具有可移植性。

  • export

    EXPORT TABLE tablename [PARTITION (part_column=”value”[, …])]
    TO ‘export_target_path’ [ FOR replication(‘eventid’) ]

  • import

IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column=”value”[, …])]]
FROM ‘source_path’
[LOCATION ‘import_target_path’]

  • 例子

    导出导入表
    export table emp to ‘/emp/emp.sql’
    import table new_emp from ‘/emp/emp.sql’

导出导入分区表
export table emp_dy_partition partition(deptno=30) to ‘/exprt’;
import table new_emp_dy partition (deptno=30) from ‘/exprt’;

若泽大数据交流群:671914634

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值