Hive连接Hbase操作数据

Hive连接Hbase操作数据

 

版权声明:本文为博主原创文章,未经博主允许不得转载。转载请注明来自http://blog.csdn.net/lr131425 https://blog.csdn.net/lr131425/article/details/72722932

Hive整合HBase原理

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠Hive安装包lib/hive-hbase-handler.jar工具类,它负责Hbase和Hive进行通信的。

 

hadoop,hbase,hive都已经集群正常安装。

hadoop,hbase,hive都已正常启动。

 

命令行模式连接连接hbase   

[root@node1 bin]# ./hbase shell

list 看下table

 

 
  1. hbase(main):006:0* list

  2. TABLE

  3. test

  4. user

  5. 2 row(s) in 0.4750 seconds

看下表结构

 

 
  1. hbase(main):007:0> describe 'user'

  2. Table user is ENABLED

  3. user

  4. COLUMN FAMILIES DESCRIPTION

  5. {NAME => 'account', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEE

  6. P_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  7. {NAME => 'address', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEE

  8. P_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  9. {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_D

  10. ELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  11. {NAME => 'userid', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP

  12. _DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  13. 4 row(s) in 0.7020 seconds

  14.  

 

然后扫描user表数据看看。

 

 
  1. hbase(main):004:0> scan 'user'

  2. ROW COLUMN+CELL

  3. lisi column=account:name, timestamp=1495708477345, value=lisi

  4. lisi column=account:passport, timestamp=1495708477353, value=96857123123231

  5. lisi column=account:password, timestamp=1495708477349, value=654321

  6. lisi column=address:city, timestamp=1495708477381, value=\xE6\xB7\xB1\xE5\x9C\xB3

  7. lisi column=address:province, timestamp=1495708477377, value=\xE5\xB9\xBF\xE4\xB8\x9C

  8. lisi column=info:age, timestamp=1495708477358, value=38

  9. lisi column=info:sex, timestamp=1495708477363, value=\xE5\xA5\xB3

  10. lisi column=userid:id, timestamp=1495708477330, value=002

  11. zhangsan column=account:name, timestamp=1495708405658, value=zhangsan

  12. zhangsan column=account:passport, timestamp=1495708405699, value=968574321

  13. zhangsan column=account:password, timestamp=1495708405669, value=123456

  14. zhangsan column=address:city, timestamp=1495708405773, value=\xE6\xB7\xB1\xE5\x9C\xB3

  15. zhangsan column=address:province, timestamp=1495708405764, value=\xE5\xB9\xBF\xE4\xB8\x9C

  16. zhangsan column=info:age, timestamp=1495708405712, value=26

  17. zhangsan column=info:sex, timestamp=1495708405755, value=\xE7\x94\xB7

  18. zhangsan column=userid:id, timestamp=1495708405444, value=001

  19. 2 row(s) in 0.2020 seconds

 

在hive/bin中运行hive命令行模式

[root@master bin]# ./hive

执行建立关联hbase关联语句hbase_user表

会发现有报错:return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations

 

 
  1. hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,

  2. > province string,city string,age string,sex string ,id string)

  3. > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

  4. > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,

  5. > address:province,address:city,info:age,info:sex,userid:id")

  6. > TBLPROPERTIES("hbase.table.name" = "user");

  7. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations

  8. at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)

  9. at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)

  10. at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)

  11. at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)

  12. at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)

  13. at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)

  14. at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)

  15. at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)

  16. at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:811)

  17. at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)

  18. at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)

  19. at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:303)

  20. at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:313)

  21. at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:205)

  22. at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:742)

  23.  

can not get location 以及下面的日志 应该是连不上hbase,用jps查看下hbase的运行情况

 

 
  1. [root@master conf]# jps

  2. 3945 HMaster

  3. 18681 RunJar

  4. 2699 NameNode

  5. 3330 NodeManager

  6. 2951 SecondaryNameNode

  7. 3226 ResourceManager

  8. 3874 HQuorumPeer

  9. 18901 Jps

发现一切正常

接着查看下hive日志发现: 都是zookeeper连接失败 

 Opening socket connection to server localhost/127.0.0.1:2181

 

 
  1. 2017-05-25T03:06:12,259 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  2. 2017-05-25T03:06:12,260 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  3. java.net.ConnectException: Connection refused

  4. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

  5. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

  6. at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

  7. at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

  8. 2017-05-25T03:06:13,362 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  9. 2017-05-25T03:06:13,363 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  10. java.net.ConnectException: Connection refused

  11. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

  12. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

  13. at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

  14. at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

  15. 2017-05-25T03:06:13,465 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  16. 2017-05-25T03:06:13,466 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  17. java.net.ConnectException: Connection refused

  18. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

  19. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

  20. at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

  21. at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

  22. 2017-05-25T03:06:14,568 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  23. 2017-05-25T03:06:14,569 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  24.  

去看下hive-site.xml的配置文件,配置上

 

 
  1. <property>

  2. <name>hive.zookeeper.quorum</name>

  3. <value>master,node1,node2</value>

  4. <description>

  5. List of ZooKeeper servers to talk to. This is needed for:

  6. 1. Read/write locks - when hive.lock.manager is set to

  7. org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,

  8. 2. When HiveServer2 supports service discovery via Zookeeper.

  9. 3. For delegation token storage if zookeeper store is used, if

  10. hive.cluster.delegation.token.store.zookeeper.connectString is not set

  11. 4. LLAP daemon registry service

  12. </description>

  13. </property>

 
  1. <property>

  2. <name>hive.zookeeper.client.port</name>

  3. <value>2181</value>

  4. <description>

  5. The port of ZooKeeper servers to talk to.

  6. If the list of Zookeeper servers specified in hive.zookeeper.quorum

  7. does not contain port numbers, this value is used.

  8. </description>

  9. </property>


而 hbase集群中的用zookeeper的默认的端口是2222,所以为了端口统一,     把hbase-site.xml中的改成2181,记得重启服务

或者直接  把hbase-site.xml 复制到hive的conf目录下,hive会读取hbase的zookeeper的 zookeeper.quorum和 zookeeper.port

两种方法都可以解决问题


接着在hive中再次执行 create table语句

 

 
  1. hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,

  2. > province string,city string,age string,sex string ,id string)

  3. > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

  4. > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,

  5. > address:province,address:city,info:age,info:sex,userid:id")

  6. > TBLPROPERTIES("hbase.table.name" = "user");

  7. OK

  8. Time taken: 20.323 seconds

  9. hive> show tables;

  10. OK

  11. apachelog

  12. hbase_user

  13. Time taken: 2.75 seconds, Fetched: 2 row(s)


执行成功,

接着用hiveql查询下数据

 

 
  1. hive> select * from hbase_user;

  2. OK

  3. lisi NULL 96857123123231 NULL lisi 654321 广东 深圳 38 女 002

  4. zhangsan NULL 968574321 NULL zhangsan 123456 广东 深圳 26 男 001

  5. Time taken: 5.798 seconds, Fetched: 2 row(s)

  6. hive> describe hbase_user;

  7. OK

  8. key string

  9. idcard string

  10. passport string

  11. country string

  12. name string

  13. password string

  14. province string

  15. city string

  16. age string

  17. sex string

  18. id string

  19. Time taken: 3.785 seconds, Fetched: 11 row(s)

  20. hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;

  21. OK

  22. lisi NULL 654321 NULL lisi 96857123123231 广东 深圳 38 女 002

  23. zhangsan NULL 123456 china zhangsan 968574321 广东 深圳 26 男 001

  24. Time taken: 2.341 seconds, Fetched: 2 row(s)

..null 是因为 hbase的column没有设置idcard字段值,和 country的值所以是为null

给hbase 表 user设置country看看,和idcard

./hbase shell

 

 
  1. hbase(main):003:0> put 'user','zhangsan','account:idcard','420923156366998855';

  2. hbase(main):004:0* put 'user','lisi','account:idcard','520369856366998855';

  3. hbase(main):005:0* put 'user','lisi','account:country','china';

 

 
  1. hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;

  2. OK

  3. lisi 520369856366998855 654321 china lisi 96857123123231 广东 深圳 38 女 002

  4. zhangsan 420923156366998855 123456 china zhangsan 968574321 广东 深圳 26 男 001

  5. Time taken: 2.388 seconds, Fetched: 2 row(s)

  6. hive> select * from hbase_user where name='zhangsan';

  7. OK

  8. zhangsan 420923156366998855 968574321 china zhangsan 123456 广东 深圳 26 男 001

  9. Time taken: 2.651 seconds, Fetched: 1 row(s)

  10. hive> select count(key) from hbase_user;

  11. WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

  12. Query ID = root_20170525040249_f808c765-79f6-43c0-aa94-ebfed7751091

  13. Total jobs = 1

  14. Launching Job 1 out of 1

  15. Number of reduce tasks determined at compile time: 1

  16. In order to change the average load for a reducer (in bytes):

  17. set hive.exec.reducers.bytes.per.reducer=<number>

  18. In order to limit the maximum number of reducers:

  19. set hive.exec.reducers.max=<number>

  20. In order to set a constant number of reducers:

  21. set mapreduce.job.reduces=<number>

  22. Starting Job = job_1495621107567_0001, Tracking URL = http://master:8088/proxy/application_1495621107567_0001/

  23. Kill Command = /usr/tools/hadoop/bin/hadoop job -kill job_1495621107567_0001

  24. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

  25. FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. PermGen space

 


执行count的时候,运行mapreducer,PermGen space了。。。。  也是醉了

window7机器上了运行了虚拟机,3个linux组成的  hadoop,hbase,hive集群, 8g内存  内存使用率走横线,..... 

实际上有时是可以count出来数据的,估计windows开了不少进程

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值