Hive与Hbase整合实操

1、安装环境介绍

1-1 节点介绍

操作系统版本IP主机名角色类型
Centos Stream 910.10.10.100masterMaster
Centos Stream 910.10.10.101slave1Slave1
Centos Stream 910.10.10.102slave2Slave2

1-2 组件版本介绍

组件名称组件版本
Hadoop3.3.6
Java1.8
Flume1.11.0
Hive3.1.2
Zookeeper3.6.3
Flink1.18.1
Hbase2.5.4
Hudi0.15.0
Kafka3.6.2
Sqoop1.4.7

2、Hive与Hbase的整合操作

搭建不在这里进行单独编写了,请参考专栏文章《Hadoop分布式集群搭建(全网最详细包含解析)》

2.1 Hive配置文件修改

###请注意,在进行整合时,请确保自己的Hive,Hbase,Zookeeper组件是正常运行的
Master
#注意目录
[root@master conf]# pwd
/opt/hive/conf
[root@master conf]# vim hive-site.xml
#末尾新增即可
        <property>
                <name>hive.zookeeper.quorum</name>
                <value>master:2181,slave1:2181,slave2:2181</value>
        </property>
#这个地方一定要和自己的版本对应,也要注意路径
        <property>
                <name>hive.aux.jars.path</name>
                <value>
                        file:///opt/hbase/lib/hbase-common-2.5.4.jar
                        file:///opt/hbase/lib/hbase-client-2.5.4.jar
                        file:///opt/hbase/lib/hbase-server-2.5.4.jar
                        file:///opt/hbase/lib/hbase-hadoop2-compat-2.5.4.jar
                        file:///opt/hbase/lib/netty-buffer-4.1.45.Final.jar
                        file:///opt/hbase/lib/netty-codec-4.1.45.Final.jar
                        file:///opt/hbase/lib/netty-common-4.1.45.Final.jar
                        file:///opt/hbase/lib/netty-handler-4.1.45.Final.jar
                        file:///opt/hbase/lib/netty-resolver-4.1.45.Final.jar
                        file:///opt/hbase/lib/netty-transport-4.1.45.Final.jar
                        file:///opt/hbase/lib/netty-transport-native-epoll-4.1.45.Final.jar
                        file:///opt/hbase/lib/netty-transport-native-unix-common-4.1.45.Final.jar
                        file:///opt/hbase/lib/hbase-protocol-2.5.4.jar
                        file:///opt/hbase/lib/zookeeper-3.5.7.jar
                </value>
        </property>

2.2 Hive整合Hbase测试

### 这个地方一定要注意大小写,字母巨多,任何一个大小写错了都不会成功,仔细检查
hive> create table hive_student(id int, name string)
    > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > with serdeproperties ("hbase.columns.mapping" = ":key,cf1:name")
    > tblproperties("hbase.table.name" = "hive_student");
OK
Time taken: 2.441 seconds

###新开一个窗口,进入Hbase Shell查看创建的表
Hbase
hbase:001:0> list
TABLE                                                                                     
hive_student                                                                              
t1                                                                                        
2 row(s)
Took 0.2752 seconds                                                                       
=> ["hive_student", "t1"]
### Hive插入数据操作
Hive
hive> insert into hive_student values(1,'zhangsan');
Query ID = root_20241123090913_ffb6149b-0225-4d95-ad7c-39747ab9dc9e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1732322224947_0001, Tracking URL = http://master:8088/proxy/application_1732322224947_0001/
Kill Command = /opt/hadoop/bin/mapred job  -kill job_1732322224947_0001
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2024-11-23 09:09:28,089 Stage-2 map = 0%,  reduce = 0%
2024-11-23 09:09:36,398 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 3.33 sec
MapReduce Total cumulative CPU time: 3 seconds 330 msec
Ended Job = job_1732322224947_0001
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   Cumulative CPU: 3.33 sec   HDFS Read: 13150 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 330 msec
OK
Time taken: 24.734 seconds
### Hbase查看刚才插入的数据
Hbase
hbase:004:0> scan 'hive_student'
ROW                     COLUMN+CELL                                                       
 1                      column=cf1:name, timestamp=2024-11-23T09:09:36.212, value=zhangsan
1 row(s)
Took 0.0077 seconds 
### Hbase修改数据
hbase:010:0> put 'hive_student','1','cf1:name','lisi'
Took 0.0060 seconds                                                                       
hbase:011:0> scan 'hive_student'
ROW                     COLUMN+CELL                                                       
 1                      column=cf1:name, timestamp=2024-11-23T09:14:39.666, value=lisi    
1 row(s)
Took 0.0081 seconds    
### Hive查看数据
hive> select * from hive_student;
OK
1	lisi
Time taken: 0.215 seconds, Fetched: 1 row(s)
### 数据已经修改

2.3Hive分析搜狗用户搜索日志

1、使用Notepad++。点击视图,显示符号,显示所有符号成以下状态

在这里插入图片描述

2、上传日志到Hadoop目录下
3、进入格式替换
[root@master hadoop]# sed -i "s/\t/,/g" sougou.log 
[root@master hadoop]# sed -i "s/ /,/g" sougou.log 
## 替换后的数据格式如下

在这里插入图片描述

4、创建Hive表
##注意大小写
hive> CREATE TABLE activelog (
    >     event_time STRING,
    >     user_id STRING,
    >     keyword STRING,
    >     page_rank INT,
    >     click_order INT,
    >     url STRING
    > )
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY ',';
OK
Time taken: 0.31 seconds
### event_time 访问时间
###	user_id 用户id
### keyword 搜索关键词
###	page_rank 结果连接排名
###	click_order 用户单击的顺序号
###	url 用户单击的url
### ROW FORMAT DELIMITED 指定行数据的存储格式为分隔符分隔的数据。
### FIELDS TERMINATED BY ','; 定义字段之间用逗号 , 分隔。这是指定表的数据文件中,每一列的值是用逗号分隔的。


### 导入数据
hive> load data local inpath '/opt/hadoop/sougou.log' into table activelog;
Loading data to table default.activelog
OK
Time taken: 0.486 seconds

### 数据分析
hive> select * from activelog limit 10;
OK
00:00:00	2982199073774412	[360安全卫士]	8	3	download.it.com.cn/softweb/software/firewall/antivirus/20067/17938.html
00:00:00	07594220010824798	[哄抢救灾物资]	1	1	news.21cn.com/social/daqian/2008/05/29/4777194_1.shtml
00:00:00	5228056822071097	[75810部队]	14	5	www.greatoo.com/greatoo_cn/list.asp?link_id=276&title=%BE%DE%C2%D6%D0%C2%CE%C5
00:00:00	6140463203615646	[绳艺]	62	36	www.jd-cd.com/jd_opus/xx/200607/706.html
00:00:00	8561366108033201	[汶川地震原因]	3	2	www.big38.net/
00:00:00	23908140386148713	[莫衷一是的意思]	1	2	www.chinabaike.com/article/81/82/110/2007/2007020724490.html
00:00:00	1797943298449139	[星梦缘全集在线观看]	8	5	www.6wei.net/dianshiju/????\xa1\xe9|????do=index
00:00:00	00717725924582846	[闪字吧]	1	2	www.shanziba.com/
00:00:00	41416219018952116	[霍震霆与朱玲玲照片]	2	6	bbs.gouzai.cn/thread-698736.html
00:00:00	9975666857142764	[电脑创业]	2	2	ks.cn.yahoo.com/question/1307120203719.html
Time taken: 0.103 seconds, Fetched: 10 row(s)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值