1 hive 的 import,export
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport
1.1 export
- 这里的导出路径指的是 HDFS 里的路径
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
TO 'export_target_path' [ FOR replication('eventid') ]
export table emp to '/user/hive/warehouse/test_export/emp_exp';
2 Sort
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
order by
: 全局排序,一个 reducesort by
: 每个 reduce 内部进行排序,全局不是排序distribute by
: 类似 MR 中的 partition,进行分区,结合 sort by 使用;cluster by
: 当使用 distribute 和 sort 字段相同时,使用
2.1 order by
(谨慎使用)
- 全局数据排序
SELECT * FROM emp order by empno desc;
hive (default)> SELECT * FROM emp order by empno desc;
Query ID = hadoop_20190117103838_eb7585e6-fbbc-4fc8-978a-fb5216cc70d0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1547692669080_0013, Tracking URL = http://node1:8088/proxy/application_1547692669080_0013/
Kill Command = /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1547692669080_0013
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-01-17 22:04:54,709 Stage-1 map = 0%, reduce = 0%
2019-01-17 22:04:58,936 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.34 sec
2019-01-17 22:05:05,106 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.78 sec
MapReduce Total cumulative CPU time: 2 seconds 780 msec
Ended Job = job_1547692669080_0013
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.78 sec HDFS Read: 8185 HDFS Write: 674 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 780 msec
OK
emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
7934 Miller clerk 7782 1982-1-23 1300.0 10.0 NULL
7902 Ford analyst 7566 1981-12-3 3000.0 20.0 NULL
7900 Jamas clerk 7698 1981-12-3 950.0 30.0 NULL
7876 Adams clerk 7788 1987-5-23 1100.0 20.0 NULL
7844 Turner salesman 7698 1981-9-8 1500.0 0.0 30
7839 King president NULL 5000.0 10.0 NULL NULL
7788 Scott analyst 7566 1987-4-19 3000.0 20.0 NULL
7782 Clark manager 7839 1981-6-9 2450.0 10.0 NULL
7698 Blake manager 7839 1981-5-1 2850.0 30.0 NULL
7654 Martin salesman 7698 1981-9-28 1250.0 1400.0 30
7566 Jones manager 7839 1981-4-2 2975.0 20.0 NULL
7521 Ward salsesman 7698 1981-2-22 1250.0 500.0 30
7499 Allen salesman 7698 1981-2-20 1600.0 300.0 30
7369 Smith clerk 7902 1980-12-17 800.0 20.0 NULL
Time taken: 17.23 seconds, Fetched: 14 row(s)
2.2 sort by
- 对每一个 reduce 内部数据进行排序
set mapreduce.job.reduces=3;
INSERT overwrite local directory '/home/hadoop/tempdata/sortby_res'
SELECT * FROM emp sort by empno asc;
2.3 distribute by
- 分区 partition
- 类似于 MapReduce 中分区 partition,对数据进行分区,结合 sort by 使用
distribute by
必须要在sort by
前面
INSERT overwrite local directory '/home/hadoop/tempdata/distby_res'
SELECT * FROM emp distribute by deptno sort by empno asc;
2.4 cluster by
- 当
distribute by
和sort by
字段相同时,可以使用