Hive DML、SerDe

Hive DML

1、加载文件到数据表,当使用LOCAL时是拷贝,使用HDFS上文件时时移动(原有数据会被删除)

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
 
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

加载HDFS文件到数据表,HDFS文件数据被移掉 

删除表数据:
hive> truncate table student;
OK
Time taken: 0.347 seconds

加载文件到表:
hive> load data inpath '/user/age=10/student' into table student;
Loading data to table default.student
OK
Time taken: 0.514 seconds

查询导入的数据:
hive> select * from student;
OK
1	小红1	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
2	小红2	["王者","book","movie"]	{"modu":"renminglu","xizhang":"lasha"}
3	小红3	["吃鸡","book","movie"]	{"chongqing":"renminglu","shenzheng":"futian"}
4	小红4	["王者","book","movie"]	{"modu":"renminglu","dongguang":"changan"}
5	小红5	["walking","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
6	小红6	["王者","book","movie"]	{"nanchang":"renminglu","shenzheng":"futian"}
7	小红7	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
8	小红8	["walking","book","movie"]	{"nanchang":"renminglu","guangzhou":"niwan"}
9	小红9	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
10	小红10	["王者","book","movie"]	{"shanghai":"renminglu","shenzheng":"futian"}
Time taken: 0.185 seconds, Fetched: 10 row(s)
hive> 

2、查询插入hive数据表

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
 
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
1、创建2张待插入数据的表
create external table student_dml
(
id int,
name String
)
partitioned by (age int)
row format delimited
fields terminated by ',';

create external table student_dml2
(
id int,
name String,
address map<String, String>
)
partitioned by (age int)
row format delimited
fields terminated by ','
map keys terminated by ':';

2、执行查询插入数据SQL
hive> from student
    > insert into table student_dml partition(age=12) select id,name
    > insert into table student_dml2 partition(age=13) select id,name,address;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20210626163657_faeea658-b9ce-4da2-bf62-a7186d5ba3b2
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1624671436790_0003, Tracking URL = http://node03:8088/proxy/application_1624671436790_0003/
Kill Command = /opt/software/hadoop-2.10.1/bin/hadoop job  -kill job_1624671436790_0003
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2021-06-26 16:37:13,831 Stage-2 map = 0%,  reduce = 0%
2021-06-26 16:37:25,455 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.49 sec
MapReduce Total cumulative CPU time: 1 seconds 490 msec
Ended Job = job_1624671436790_0003
Stage-5 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
Stage-12 is filtered out by condition resolver.
Moving data to directory hdfs://mycluster/user/hive/warehouse/student_dml/age=12/.hive-staging_hive_2021-06-26_16-36-57_079_5102616348730834889-1/-ext-10000
Moving data to directory hdfs://mycluster/user/hive/warehouse/student_dml2/age=13/.hive-staging_hive_2021-06-26_16-36-57_079_5102616348730834889-1/-ext-10002
Loading data to table default.student_dml partition (age=12)
Loading data to table default.student_dml2 partition (age=13)
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   Cumulative CPU: 1.49 sec   HDFS Read: 6028 HDFS Write: 704 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 490 msec
OK
Time taken: 31.724 seconds


3、查询数据:
hive> select * from student_dml;
OK
1	小红1	12
2	小红2	12
3	小红3	12
4	小红4	12
5	小红5	12
6	小红6	12
7	小红7	12
8	小红8	12
9	小红9	12
10	小红10	12
Time taken: 0.262 seconds, Fetched: 10 row(s)


hive> select * from student_dml2;
OK
1	小红1	{"modu":"renminglu","shenzheng":"futian"}	13
2	小红2	{"modu":"renminglu","xizhang":"lasha"}	13
3	小红3	{"chongqing":"renminglu","shenzheng":"futian"}	13
4	小红4	{"modu":"renminglu","dongguang":"changan"}	13
5	小红5	{"modu":"renminglu","shenzheng":"futian"}	13
6	小红6	{"nanchang":"renminglu","shenzheng":"futian"}	13
7	小红7	{"modu":"renminglu","shenzheng":"futian"}	13
8	小红8	{"nanchang":"renminglu","guangzhou":"niwan"}	13
9	小红9	{"modu":"renminglu","shenzheng":"futian"}	13
10	小红10	{"shanghai":"renminglu","shenzheng":"futian"}	13
Time taken: 0.209 seconds, Fetched: 10 row(s)
hive> 

2、Hive SerDe - Serializer and Deserializer,使用在比较复杂的场景下(非结构化规则表)

        SerDe 用于做序列化和反序列化

        构建在数据存储和执行引擎之间,对两者实现解耦

        Hive通过ROW FORMAT DELIMITED以及SERDE进行内容的读写

row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

 

1、建表:
CREATE TABLE apachelog 
(
  host STRING,
  identity STRING,
  t_user STRING,
  time STRING,
  request STRING,
  referer STRING,
  agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-9]*)"
)
STORED AS TEXTFILE;

2、准备一个日志数据文件
[root@node04 data]# cat apachelog.txt 
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -

3、加载数据
hive> load  data local inpath '/opt/software/data/apachelog.txt' into table apachelog;
Loading data to table default.apachelog
OK
Time taken: 0.422 seconds
hive> 


4、查询插入的数据信息:
hive> select * from apachelog;
OK
192.168.57.4	-	-	29/Feb/2016:18:14:35 +0800	GET /bg-upper.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:35 +0800	GET /bg-nav.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:35 +0800	GET /asf-logo.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:35 +0800	GET /bg-button.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:35 +0800	GET /bg-middle.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET / HTTP/1.1	200	11217
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET / HTTP/1.1	200	11217
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /tomcat.css HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /tomcat.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /asf-logo.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /bg-middle.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /bg-button.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /bg-nav.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /bg-upper.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET / HTTP/1.1	200	11217
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /tomcat.css HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /tomcat.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET / HTTP/1.1	200	11217
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /tomcat.css HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /tomcat.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /bg-button.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2016:18:14:36 +0800	GET /bg-upper.png HTTP/1.1	304	-
Time taken: 0.188 seconds, Fetched: 22 row(s)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值