1 Hive 中的数据类型
主要有 5 大类
1.primitive_type
2. array_type
3. map_type
4. struct_type
5. union_type
1.1 primitive_type
tinyint
smallint
int
bigint
boolean
float
2 以雇员表和部门表为例介绍 Hive 表操作
2.1 创建表
- 创建员工表
create table if not exists default.emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) row format delimited fields terminated by '\t';
- 创建部门表
create table if not exists default.dept(
deptno int,
dname string,
loc string
) row format delimited fields terminated by '\t';
2.2 导入数据
load data local inpath '/home/hadoop/emp.txt' overwrite into table emp;
load data local inpath '/home/hadoop/dept.txt' overwrite into table dept;
emp.txt
7369 Smith clerk 7902 1980-12-17 800.0 20
7499 Allen salesman 7698 1981-2-20 1600.0 300.0 30
7521 Ward salsesman 7698 1981-2-22 1250.0 500.0 30
7566 Jones manager 7839 1981-4-2 2975.0 20
7654 Martin salesman 7698 1981-9-28 1250.0 1400.0 30
7698 Blake manager 7839 1981-5-1 2850.0 30
7782 Clark manager 7839 1981-6-9 2450.0 10
7788 Scott analyst 7566 1987-4-19 3000.0 20
7839 King president 1981-11-17 5000.0 10
7844 Turner salesman 7698 1981-9-8 1500.0 0.0 30
7876 Adams clerk 7788 1987-5-23 1100.0 20
7900 Jamas clerk 7698 1981-12-3 950.0 30
7902 Ford analyst 7566 1981-12-3 3000.0 20
7934 Miller clerk 7782 1982-1-23 1300.0 10
dept.txt
10 ACCOUNTING New YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
2.3 通过子查询创建表
create table if not exists default.dept_cats
as
select * from dept;
hive (default)> create table if not exists default.dept_cats
> as
> select * from dept;
Query ID = hadoop_20190117103838_eb7585e6-fbbc-4fc8-978a-fb5216cc70d0
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1547692669080_0001, Tracking URL = http://node1:8088/proxy/application_1547692669080_0001/
Kill Command = /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1547692669080_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-01-17 10:39:16,544 Stage-1 map = 0%, reduce = 0%
2019-01-17 10:39:22,776 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 sec
MapReduce Total cumulative CPU time: 1 seconds 720 msec
Ended Job = job_1547692669080_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://node1:8020/user/hive/warehouse/.hive-staging_hive_2019-01-17_10-39-04_541_749900913359101249-1/-ext-10001
Moving data to: hdfs://node1:8020/user/hive/warehouse/dept_cats
Table default.dept_cats stats: [numFiles=1, numRows=4, totalSize=80, rawDataSize=76]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.72 sec HDFS Read: 2781 HDFS Write: 153 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
dept.deptno dept.dname dept.loc
Time taken: 19.583 seconds
hive (default)> select * from dept_cats;
OK
dept_cats.deptno dept_cats.dname dept_cats.loc
10 ACCOUNTING New YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
Time taken: 0.21 seconds, Fetched: 4 row(s)
2.3 清空表
truncate table dept_cats;
2.4 通过 like
创建表
create table if not exists default.dept_like
like
default.dept;
2.5 重命名表
alter table dept_like rename to dept_like_rename;
2.6 删除表
drop table if exists dept_like_rename;