- 加载文件到表
- LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]
hive> load data local inpath "/home/hadoop/data/deptn.sql" overwrite into table dept;
Loading data to table default.dept
Table default.dept stats: [numFiles=1, numRows=0, totalSize=80, rawDataSize=0]
OK
Time taken: 2.401 seconds
- 查询结果写入文件系统(导出数据)
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format]
SELECT … FROM …
insert overwrite local directory '/home/hadoop/data/outputemp2'
row format delimited fields terminated by "\t"
select * from emp;
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] …//导出到多张表
from emp
INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select empno, ename
INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select ename;
利用hive -e 导出数据
- 查询语句
每个部门的平均工资
select deptno, avg(salary) from emp group by deptno;
select ename, deptno, avg(salary) from emp group by deptno;
Expression not in GROUP BY key ‘ename’//注意对应关系
出现在select中的字段,要么出现在group by中,要么是聚合函数
每个部门、岗位的最高工资
select deptno, job, max(salary) from emp group by deptno, job;
每个部门的平均工资大于2000
select deptno, avg(salary) avg_sal from emp group by deptno having avg_sal>2000;
对查询结果简单分类
select ename,sal,
case
when sal>1 and sal<=1000 then ‘lower’
when sal>1000 and sal <=2000 then ‘moddle’
when sal>2000 and sal <=4000 then ‘high’
else ‘highest’
end
from emp;
hive> select ename,sal,
> case
> when sal>1 and sal<=1000 then 'lower'
> when sal>1000 and sal <=2000 then 'moddle'
> when sal>2000 and sal <=4000 then 'high'
> else 'highest'
> from emp;
FAILED: ParseException line 7:0 missing KW_END at 'from' near '<EOF>'
hive> select ename,sal,
> case
> when sal>1 and sal<=1000 then 'lower'
> when sal>1000 and sal <=2000 then 'moddle'
> when sal>2000 and sal <=4000 then 'high'
> else 'highest'
> end
> from emp;
OK
SMITH 800.0 lower
ALLEN 1600.0 moddle
WARD 1250.0 moddle
JONES 2975.0 high
MARTIN 1250.0 moddle
BLAKE 2850.0 high
CLARK 2450.0 high
SCOTT 3000.0 high
KING 5000.0 highest
TURNER 1500.0 moddle
ADAMS 1100.0 moddle
JAMES 950.0 lower
FORD 3000.0 high
MILLER 1300.0 moddle
Time taken: 2.191 seconds, Fetched: 14 row(s)
将查询结果合并
select count(1) from emp where empno=7566
union all
select count(1) from emp where empno=7654;
hive> select count(1) from emp where empno=7566
> union all
> select count(1) from emp where empno=7654;
Query ID = hadoop_20180109015050_8d760d00-2c6e-4cc4-a99e-5573e64bfc9b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1515472546059_0002, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1515472546059_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-01-09 01:52:52,767 Stage-1 map = 0%, reduce = 0%
2018-01-09 01:53:15,692 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.93 sec
2018-01-09 01:53:33,559 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.71 sec
MapReduce Total cumulative CPU time: 8 seconds 710 msec
Ended Job = job_1515472546059_0002
Launching Job 2 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1515472546059_0003, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0003/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1515472546059_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2018-01-09 01:53:52,732 Stage-3 map = 0%, reduce = 0%
2018-01-09 01:54:15,733 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 7.23 sec
2018-01-09 01:54:37,718 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 9.75 sec
MapReduce Total cumulative CPU time: 9 seconds 750 msec
Ended Job = job_1515472546059_0003
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1515472546059_0004, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0004/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1515472546059_0004
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 0
2018-01-09 01:54:56,536 Stage-2 map = 0%, reduce = 0%
2018-01-09 01:55:21,218 Stage-2 map = 50%, reduce = 0%, Cumulative CPU 2.49 sec
2018-01-09 01:55:22,329 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.81 sec
MapReduce Total cumulative CPU time: 4 seconds 810 msec
Ended Job = job_1515472546059_0004
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.71 sec HDFS Read: 7881 HDFS Write: 114 SUCCESS
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 9.75 sec HDFS Read: 7886 HDFS Write: 114 SUCCESS
Stage-Stage-2: Map: 2 Cumulative CPU: 4.81 sec HDFS Read: 5348 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 270 msec
OK
1
1
Time taken: 179.507 seconds, Fetched: 2 row(s)
import和export的使用
导入数据和导出数据的特点时将元数据也一同导出和导入,可以移植到不同的Hadoop上,具有可移植性。
export
EXPORT TABLE tablename [PARTITION (part_column=”value”[, …])]
TO ‘export_target_path’ [ FOR replication(‘eventid’) ]import
IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column=”value”[, …])]]
FROM ‘source_path’
[LOCATION ‘import_target_path’]
例子
导出导入表
export table emp to ‘/emp/emp.sql’
import table new_emp from ‘/emp/emp.sql’
导出导入分区表
export table emp_dy_partition partition(deptno=30) to ‘/exprt’;
import table new_emp_dy partition (deptno=30) from ‘/exprt’;
若泽大数据交流群:671914634