1、加载文件到数据表,当使用LOCAL时是拷贝,使用HDFS上文件时时移动(原有数据会被删除)
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
加载HDFS文件到数据表,HDFS文件数据被移掉
删除表数据:
hive> truncate table student;
OK
Time taken: 0.347 seconds
加载文件到表:
hive> load data inpath '/user/age=10/student' into table student;
Loading data to table default.student
OK
Time taken: 0.514 seconds
查询导入的数据:
hive> select * from student;
OK
1 小红1 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
2 小红2 ["王者","book","movie"] {"modu":"renminglu","xizhang":"lasha"}
3 小红3 ["吃鸡","book","movie"] {"chongqing":"renminglu","shenzheng":"futian"}
4 小红4 ["王者","book","movie"] {"modu":"renminglu","dongguang":"changan"}
5 小红5 ["walking","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
6 小红6 ["王者","book","movie"] {"nanchang":"renminglu","shenzheng":"futian"}
7 小红7 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
8 小红8 ["walking","book","movie"] {"nanchang":"renminglu","guangzhou":"niwan"}
9 小红9 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
10 小红10 ["王者","book","movie"] {"shanghai":"renminglu","shenzheng":"futian"}
Time taken: 0.185 seconds, Fetched: 10 row(s)
hive>
2、查询插入hive数据表
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
1、创建2张待插入数据的表
create external table student_dml
(
id int,
name String
)
partitioned by (age int)
row format delimited
fields terminated by ',';
create external table student_dml2
(
id int,
name String,
address map<String, String>
)
partitioned by (age int)
row format delimited
fields terminated by ','
map keys terminated by ':';
2、执行查询插入数据SQL
hive> from student
> insert into table student_dml partition(age=12) select id,name
> insert into table student_dml2 partition(age=13) select id,name,address;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20210626163657_faeea658-b9ce-4da2-bf62-a7186d5ba3b2
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1624671436790_0003, Tracking URL = http://node03:8088/proxy/application_1624671436790_0003/
Kill Command = /opt/software/hadoop-2.10.1/bin/hadoop job -kill job_1624671436790_0003
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2021-06-26 16:37:13,831 Stage-2 map = 0%, reduce = 0%
2021-06-26 16:37:25,455 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.49 sec
MapReduce Total cumulative CPU time: 1 seconds 490 msec
Ended Job = job_1624671436790_0003
Stage-5 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
Stage-12 is filtered out by condition resolver.
Moving data to directory hdfs://mycluster/user/hive/warehouse/student_dml/age=12/.hive-staging_hive_2021-06-26_16-36-57_079_5102616348730834889-1/-ext-10000
Moving data to directory hdfs://mycluster/user/hive/warehouse/student_dml2/age=13/.hive-staging_hive_2021-06-26_16-36-57_079_5102616348730834889-1/-ext-10002
Loading data to table default.student_dml partition (age=12)
Loading data to table default.student_dml2 partition (age=13)
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1 Cumulative CPU: 1.49 sec HDFS Read: 6028 HDFS Write: 704 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 490 msec
OK
Time taken: 31.724 seconds
3、查询数据:
hive> select * from student_dml;
OK
1 小红1 12
2 小红2 12
3 小红3 12
4 小红4 12
5 小红5 12
6 小红6 12
7 小红7 12
8 小红8 12
9 小红9 12
10 小红10 12
Time taken: 0.262 seconds, Fetched: 10 row(s)
hive> select * from student_dml2;
OK
1 小红1 {"modu":"renminglu","shenzheng":"futian"} 13
2 小红2 {"modu":"renminglu","xizhang":"lasha"} 13
3 小红3 {"chongqing":"renminglu","shenzheng":"futian"} 13
4 小红4 {"modu":"renminglu","dongguang":"changan"} 13
5 小红5 {"modu":"renminglu","shenzheng":"futian"} 13
6 小红6 {"nanchang":"renminglu","shenzheng":"futian"} 13
7 小红7 {"modu":"renminglu","shenzheng":"futian"} 13
8 小红8 {"nanchang":"renminglu","guangzhou":"niwan"} 13
9 小红9 {"modu":"renminglu","shenzheng":"futian"} 13
10 小红10 {"shanghai":"renminglu","shenzheng":"futian"} 13
Time taken: 0.209 seconds, Fetched: 10 row(s)
hive>
2、Hive SerDe - Serializer and Deserializer,使用在比较复杂的场景下(非结构化规则表)
SerDe 用于做序列化和反序列化
构建在数据存储和执行引擎之间,对两者实现解耦
Hive通过ROW FORMAT DELIMITED以及SERDE进行内容的读写
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
1、建表:
CREATE TABLE apachelog
(
host STRING,
identity STRING,
t_user STRING,
time STRING,
request STRING,
referer STRING,
agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-9]*)"
)
STORED AS TEXTFILE;
2、准备一个日志数据文件
[root@node04 data]# cat apachelog.txt
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
3、加载数据
hive> load data local inpath '/opt/software/data/apachelog.txt' into table apachelog;
Loading data to table default.apachelog
OK
Time taken: 0.422 seconds
hive>
4、查询插入的数据信息:
hive> select * from apachelog;
OK
192.168.57.4 - - 29/Feb/2016:18:14:35 +0800 GET /bg-upper.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:35 +0800 GET /bg-nav.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:35 +0800 GET /asf-logo.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:35 +0800 GET /bg-button.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:35 +0800 GET /bg-middle.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET / HTTP/1.1 200 11217
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET / HTTP/1.1 200 11217
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /tomcat.css HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /tomcat.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /asf-logo.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /bg-middle.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /bg-button.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /bg-nav.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /bg-upper.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET / HTTP/1.1 200 11217
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /tomcat.css HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /tomcat.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET / HTTP/1.1 200 11217
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /tomcat.css HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /tomcat.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /bg-button.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:36 +0800 GET /bg-upper.png HTTP/1.1 304 -
Time taken: 0.188 seconds, Fetched: 22 row(s)