Hive连接Hbase操作数据

6 篇文章 0 订阅
6 篇文章 0 订阅
Hive整合HBase原理

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠Hive安装包lib/hive-hbase-handler.jar工具类,它负责Hbase和Hive进行通信的。


hadoop,hbase,hive都已经集群正常安装。

hadoop,hbase,hive都已正常启动。


命令行模式连接连接hbase   

[root@node1 bin]# ./hbase shell

list 看下table

hbase(main):006:0* list
TABLE                                                                                                                                                                                
test                                                                                                                                                                                 
user                                                                                                                                                                                 
2 row(s) in 0.4750 seconds
看下表结构

hbase(main):007:0> describe 'user'
Table user is ENABLED                                                                                                                                                                
user                                                                                                                                                                                 
COLUMN FAMILIES DESCRIPTION                                                                                                                                                          
{NAME => 'account', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEE
P_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                        
{NAME => 'address', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEE
P_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                        
{NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_D
ELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                           
{NAME => 'userid', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP
_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                         
4 row(s) in 0.7020 seconds


然后扫描user表数据看看。

hbase(main):004:0> scan 'user'
ROW                                            COLUMN+CELL                                                                                                                           
 lisi                                          column=account:name, timestamp=1495708477345, value=lisi                                                                              
 lisi                                          column=account:passport, timestamp=1495708477353, value=96857123123231                                                                
 lisi                                          column=account:password, timestamp=1495708477349, value=654321                                                                        
 lisi                                          column=address:city, timestamp=1495708477381, value=\xE6\xB7\xB1\xE5\x9C\xB3                                                          
 lisi                                          column=address:province, timestamp=1495708477377, value=\xE5\xB9\xBF\xE4\xB8\x9C                                                      
 lisi                                          column=info:age, timestamp=1495708477358, value=38                                                                                    
 lisi                                          column=info:sex, timestamp=1495708477363, value=\xE5\xA5\xB3                                                                          
 lisi                                          column=userid:id, timestamp=1495708477330, value=002                                                                                  
 zhangsan                                      column=account:name, timestamp=1495708405658, value=zhangsan                                                                          
 zhangsan                                      column=account:passport, timestamp=1495708405699, value=968574321                                                                     
 zhangsan                                      column=account:password, timestamp=1495708405669, value=123456                                                                        
 zhangsan                                      column=address:city, timestamp=1495708405773, value=\xE6\xB7\xB1\xE5\x9C\xB3                                                          
 zhangsan                                      column=address:province, timestamp=1495708405764, value=\xE5\xB9\xBF\xE4\xB8\x9C                                                      
 zhangsan                                      column=info:age, timestamp=1495708405712, value=26                                                                                    
 zhangsan                                      column=info:sex, timestamp=1495708405755, value=\xE7\x94\xB7                                                                          
 zhangsan                                      column=userid:id, timestamp=1495708405444, value=001                                                                                  
2 row(s) in 0.2020 seconds

在hive/bin中运行hive命令行模式

[root@master bin]# ./hive

执行建立关联hbase关联语句hbase_user表

会发现有报错:return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations

hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,
    > province string,city string,age string,sex string ,id string)   
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'   
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,
    > address:province,address:city,info:age,info:sex,userid:id")   
    > TBLPROPERTIES("hbase.table.name" = "user");
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
	at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
	at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
	at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
	at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
	at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
	at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:811)
	at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
	at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)
	at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:303)
	at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:313)
	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:205)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:742)
can not get location 以及下面的日志 应该是连不上hbase,用jps查看下hbase的运行情况

[root@master conf]# jps
3945 HMaster
18681 RunJar
2699 NameNode
3330 NodeManager
2951 SecondaryNameNode
3226 ResourceManager
3874 HQuorumPeer
18901 Jps
发现一切正常

接着查看下hive日志发现: 都是zookeeper连接失败 

 Opening socket connection to server localhost/127.0.0.1:2181

2017-05-25T03:06:12,259  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-05-25T03:06:12,260  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-05-25T03:06:13,362  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-05-25T03:06:13,363  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-05-25T03:06:13,465  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-05-25T03:06:13,466  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-05-25T03:06:14,568  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-05-25T03:06:14,569  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
去看下hive-site.xml的配置文件,配置上

<property>
    <name>hive.zookeeper.quorum</name>
    <value>master,node1,node2</value>
    <description>
      List of ZooKeeper servers to talk to. This is needed for:
      1. Read/write locks - when hive.lock.manager is set to
      org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
      2. When HiveServer2 supports service discovery via Zookeeper.
      3. For delegation token storage if zookeeper store is used, if
      hive.cluster.delegation.token.store.zookeeper.connectString is not set
      4. LLAP daemon registry service
    </description>
  </property>
<property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
    <description>
      The port of ZooKeeper servers to talk to.
      If the list of Zookeeper servers specified in hive.zookeeper.quorum
      does not contain port numbers, this value is used.
    </description>
  </property>

而 hbase集群中的用zookeeper的默认的端口是2222,所以为了端口统一,     把hbase-site.xml中的改成2181,记得重启服务

或者直接  把hbase-site.xml 复制到hive的conf目录下,hive会读取hbase的zookeeper的 zookeeper.quorum和 zookeeper.port

两种方法都可以解决问题


接着在hive中再次执行 create table语句

hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,
    > province string,city string,age string,sex string ,id string)   
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'   
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,
    > address:province,address:city,info:age,info:sex,userid:id")   
    > TBLPROPERTIES("hbase.table.name" = "user");
OK
Time taken: 20.323 seconds
hive> show tables;
OK
apachelog
hbase_user
Time taken: 2.75 seconds, Fetched: 2 row(s)

执行成功,

接着用hiveql查询下数据

hive> select * from hbase_user;
OK
lisi	NULL	96857123123231	NULL	lisi	654321	广东	深圳	38	女	002
zhangsan	NULL	968574321	NULL	zhangsan	123456	广东	深圳	26	男	001
Time taken: 5.798 seconds, Fetched: 2 row(s)
hive> describe hbase_user;
OK
key                 	string              	                    
idcard              	string              	                    
passport            	string              	                    
country             	string              	                    
name                	string              	                    
password            	string              	                    
province            	string              	                    
city                	string              	                    
age                 	string              	                    
sex                 	string              	                    
id                  	string              	                    
Time taken: 3.785 seconds, Fetched: 11 row(s)
hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;
OK
lisi	NULL	654321	NULL	lisi	96857123123231	广东	深圳	38	女	002
zhangsan	NULL	123456	china	zhangsan	968574321	广东	深圳	26	男	001
Time taken: 2.341 seconds, Fetched: 2 row(s)

..null 是因为 hbase的column没有设置idcard字段值,和 country的值所以是为null

给hbase 表 user设置country看看,和idcard

./hbase shell

hbase(main):003:0> put 'user','zhangsan','account:idcard','420923156366998855';
hbase(main):004:0* put 'user','lisi','account:idcard','520369856366998855';
hbase(main):005:0* put 'user','lisi','account:country','china';

hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;
OK
lisi	520369856366998855	654321	china	lisi	96857123123231	广东	深圳	38	女	002
zhangsan	420923156366998855	123456	china	zhangsan	968574321	广东	深圳	26	男	001
Time taken: 2.388 seconds, Fetched: 2 row(s)	
hive> select * from hbase_user where name='zhangsan';
OK
zhangsan	420923156366998855	968574321	china	zhangsan	123456	广东	深圳	26	男	001
Time taken: 2.651 seconds, Fetched: 1 row(s)
hive> select count(key) from hbase_user;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = root_20170525040249_f808c765-79f6-43c0-aa94-ebfed7751091
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1495621107567_0001, Tracking URL = http://master:8088/proxy/application_1495621107567_0001/
Kill Command = /usr/tools/hadoop/bin/hadoop job  -kill job_1495621107567_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. PermGen space



执行count的时候,运行mapreducer,PermGen space了。。。。  也是醉了

window7机器上了运行了虚拟机,3个linux组成的  hadoop,hbase,hive集群, 8g内存  内存使用率走横线,..... 

实际上有时是可以count出来数据的,估计windows开了不少进程





  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
HiveHBase和Hadoop是三种流行的大数据技术,它们都有不同的用途和优势。Hadoop是一个分布式系统平台,它可以处理大量的数据,并提供高性能的数据处理。Hive是一个数据仓库,它可以通过SQL查询访问Hadoop中的数据HBase是一个分布式数据库,它可以实时存储和访问大量的非结构化数据。 在部署HiveHBase和Hadoop时,需要注意以下几点: 首先,需要想好在哪里安装这些组件,是否在同一台服务器上安装。通常情况下,为了实现最佳性能,最好在分布式服务器集群上安装它们。 其次,需要安装和配置Hadoop,并确认其正常运行。这包括设置Hadoop目录结构、配置Hadoop的XML配置文件、启动Hadoop服务等。 第三,需要安装并配置HBase。这包括设置HBase目录结构、配置HBase的XML配置文件、启动HBase服务、设置HBase集群等。需要确保HBase的表空间和列族的设计符合需求,并且表的分布在整个HBase集群中。 最后,需要安装并配置Hive。这包括设置Hive目录结构、配置Hive的XML配置文件,启动Hive服务等;同时需要确保配置Hive的JDBC连接和ODBC连接,以便于应用程序使用Hive。还应检查Hive与Hadoop和HBase的兼容性,并进行充分测试以确保其正常运行。 总之,HiveHBase和Hadoop的部署需要仔细考虑,并按照最佳实践进行。只有这样,才能在大数据处理中获得最佳性能和效益。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

奔跑的窝窝牛

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值