Hive Serde
Hive Serde用来做序列化和反序列化,构建在数据存储和执行引擎之间,对两者实现解耦。
应用场景:
-
hive主要用来存储结构化数据,如果结构化数据存储的格式嵌套比较复杂的时候,可以使用serde的方式,利用正则表达式匹配的方法来读取数据,例如,表字段如下:id,name,map<string,array<map<string,string>>>
-
当读取数据的时候,数据的某些特殊格式不希望显示在数据中,如:192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 - 不希望数据显示的时候包含[]或者"",此时可以考虑使用serde的方式
应用案例:
-
数据文件
192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 - 192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
-
基本操作
--创建表 CREATE TABLE logtbl ( host STRING, identity STRING, t_user STRING, time STRING, request STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-9]*)" ) STORED AS TEXTFILE; --加载数据 load data local inpath '/root/data/log' into table logtbl; --查询操作 select * from logtbl; --数据显示如下(不包含[]和") 192.168.57.4 - - 29/Feb/2019:18:14:35 +0800 GET /bg-upper.png HTTP/1.1 304 - 192.168.57.4 - - 29/Feb/2019:18:14:35 +0800 GET /bg-nav.png HTTP/1.1 304 - 192.168.57.4 - - 29/Feb/2019:18:14:35 +0800 GET /asf-logo.png HTTP/1.1 304 - 192.168.57.4 - - 29/Feb/2019:18:14:35 +0800 GET /bg-button.png HTTP/1.1 304 - 192.168.57.4 - - 29/Feb/2019:18:14:35 +0800 GET /bg-middle.png HTTP/1.1 304 -
Hive-Server2
基本概念介绍(jdbc方式连接,只能使用Hive-Server2)
-
HiveServer2基本介绍 HiveServer2是一个服务接口,能够允许远程的客户端去执行SQL请求且得到检索结果。
-
Beeline HiveServer2提供了一种新的命令行接口,可以提交执行SQL语句。
-
其实Hive-Server2和Hive一样,只是为了给第三方人员提供方便使用,既:Hive给数据分析师用,Hive-Server2给开发人员用
-
之所用叫Hive-Server2,不叫Hive-Server,是因为在hadoop 2.x版本被抛弃了,升级到了Hive-Server2
hiveserver2的搭建使用
-
配置超级用户,因为启动 beeline 需要超级用户,或者伪装,所以加入如下配置
--在hdfs集群的core-site.xml文件中添加如下配置文件(将root设置为超级管理员) <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> 分发 scp core-site.xml node02:`pwd` scp core-site.xml node03:`pwd` scp core-site.xml node04:`pwd` --配置完成之后重新启动集群,或者在namenode的节点上执行如下命令 node01: hdfs dfsadmin -fs hdfs://node01:8020 -refreshSuperUserGroupsConfiguration node02: hdfs dfsadmin -fs hdfs://node02:8020 -refreshSuperUserGroupsConfiguration
-
node03:(ss -nal 可以查看端口,和netstat一样, 服务端口是10000,客户端是10002)
-
启动:hive --service hiveserver2
-
浏览器可以访问http://node03:10002/
-
-
node04:
-
启动:hive --service hiveserver2
-
-
node04(启动beeline)
-
登录方式一:
-
beeline
-
!connect jdbc:hive2://node03:10000/default root 123(用户名和密码可以不写)
-
-
登录方式二:beeline -u jdbc:hive2://node03:10000/defa
-