学习自《Hive编程指南》
1. 向管理表中装载数据
hive (default)> load data local inpath "/home/hadoop/workspace/student.txt"
> overwrite into table student1;
分区表可以跟 partition (key1 = v1, key2 = v2, …)
有 local :复制本地路径文件 到 hdfs
无 local:移动 hdfs 文件 到 新的 hdfs 路径
overwrite: 目标文件夹中的数据将会被删除
没有 overwrite : 把新增加的文件添加到目标文件夹中,不删除原数据
inpath 后的路径下,不能包含任何文件夹
2. 通过查询语句向表中插入数据
hadoop@dblab-VirtualBox:~/workspace$ cat stu.txt
1 michael male china
2 ming male china1
3 haha female china
4 huahua female china1
- 创建表,加载数据
hive (default)> create table stu(
> id int,
> name string,
> sex string,
> country string)
> row format delimited fields terminated by '\t';
hive (default)> load data local inpath '/home/hadoop/workspace/stu.txt'
> into table stu;
- 通过 select 语句向其他表填入数据
hive (default)> create table employee(
> name string,
> country string)
> row format delimited fields terminated by '\t';
hive (default)> from stu s
> insert overwrite table employee
> select s.name, s.country where s.id%2=1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20210408224138_1df23614-7945-40c0-9a4d-df88e4f58ea1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2021-04-08 22:41:40,081 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local1437521177_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/employee/.hive-staging_hive_2021-04-08_22-41-38_345_1863326332876590299-1/-ext-10000
Loading data to table default.employee
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 83 HDFS Write: 180 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
hive (default)> select * from employee;
OK
michael china
haha china
- 向多表插入数据
hive (default)> from stu s
> insert into table employee
> select s.name, s.country where s.sex='female'
> insert into table employee1
> select s.name, s.country where s.sex='male';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20210408230623_bc69bccf-348e-467d-b88e-498664f27017
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2021-04-08 23:06:24,405 Stage-2 map = 100%, reduce = 0%
Ended Job = job_local2065691620_0003
Stage-5 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
Stage-12 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/employee/.hive-staging_hive_2021-04-08_23-06-23_001_7974131043339100692-1/-ext-10000
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/employee1/.hive-staging_hive_2021-04-08_23-06-23_001_7974131043339100692-1/-ext-10002
Loading data to table default.employee
Loading data to table default.employee1
MapReduce Jobs Launched:
Stage-Stage-2: HDFS Read: 470 HDFS Write: 474 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
hive (default)> select * from employee;
ming china1
huahua china1
haha china
huahua china1
hive (default)> select * from employee1;
michael china
ming china1
3. 动态分区插入
hive (default)> from stu s
> insert overwrite table employee2
> partition (country, sex)
> select s.id, s.name, s.country, s.sex;
hive (default)> select * from employee2;
OK
3 haha china female
1 michael china male
4 huahua china1 female
2 ming china1 male
4. 从单个查询语句创建表并加载数据
表的模式由 select 生成
hive (default)> create table employee3
> as select id, name from stu
> where country='china';
hive (default)> select * from employee3;
1 michael
3 haha
此功能不能用于外部表(数据没有装载,在外部)
5. 导出数据
hive (default)> from stu s
> insert overwrite local directory '/tmp/employee'
> select s.id, s.name, s.sex
> where country='china';
可以同时写入多个文件,insert 重复写几次
hive (default)> ! ls /tmp/employee -r;
000000_0
hive (default)> ! cat /tmp/employee/000000_0;
1michaelmale
3hahafemale