python通过hbase的client读取数据

使用的python版本为3.5.2,使用过程中发现了好几个坑

首先安装thrift和hbase-thrift

pip install thrift
pip install hbase-thrift

安装完成之后第一次运行,报错误:

 
in <module> from hbase import Hbase File 
"C:\Users\tianxiao\AppData\Local\Programs\Python\Python36\lib\site-packages\hbase\Hbase.py", line 2066 
except IOError, io: ^ SyntaxError: invalid syntax

原因是python版本带来的语法兼容性问题,下载python3的Hbase文件,替换Hbase文件/usr/local/lib/python3.6/dist-packages/hbase/Hbase.py和ttypes.py
下载地址为:https://github.com/626626cdllp/infrastructure/tree/master/hbase

这样,第一个问题就解决了。

然后用如下代码:

from thrift.transport import TSocket,TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase

# thrift默认端口是9090
socket = TSocket.TSocket('192.1.1.1',9090)
socket.setTimeout(5000)

transport = TTransport.TBufferedTransport(socket)
protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = Hbase.Client(protocol)
socket.open()

print(client.getTableNames())
print(client.get('test','row1','cf:a'))

报另一个错误:thrift.transport.TTransport.TTransportException: TSocket read 0 bytes,详细报错日志为:

Traceback (most recent call last):
  File "...../test2.py", line 69, in <module>
    print(client.getTableNames())
  File "C:\Program Files (x86)\python3.5\lib\site-packages\hbase\Hbase.py", line 738, in getTableNames
    return self.recv_getTableNames()
  File "C:\Program Files (x86)\python3.5\lib\site-packages\hbase\Hbase.py", line 748, in recv_getTableNames
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File "C:\Program Files (x86)\python3.5\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 134, in readMessageBegin
    sz = self.readI32()
  File "C:\Program Files (x86)\python3.5\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 217, in readI32
    buff = self.trans.readAll(4)
  File "C:\Program Files (x86)\python3.5\lib\site-packages\thrift\transport\TTransport.py", line 61, in readAll
    chunk = self.read(sz - have)
  File "C:\Program Files (x86)\python3.5\lib\site-packages\thrift\transport\TTransport.py", line 163, in read
    self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
  File "C:\Program Files (x86)\python3.5\lib\site-packages\thrift\transport\TSocket.py", line 132, in read
    message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

原因是thrift 的server端和client端的协议不匹配造成的,python要使用TCompactProtocol,而不能使用TBinaryProtocol,

因此修改代码为:

from thrift.transport import TSocket
from thrift.transport.TTransport import TFramedTransport
from thrift.protocol import TCompactProtocol
from hbase import Hbase

# thrift默认端口是9090
socket = TSocket.TSocket('10.1.1.1',9090)
socket.setTimeout(5000)

transport = TFramedTransport(socket)
protocol = TCompactProtocol.TCompactProtocol(transport)
client = Hbase.Client(protocol)
transport.open()

# print(client.getTableNames())
print(client.get('dev_user_tags','0000f00437f8','basic_info:age'))

代码可以正常运行。

最后一个问题,如果找不到TFramedTransport和TCompactProtocol,检查site-packages/thrift目录下有没有相关文件,没有,可以在thrift官网下载源码包,并用其中的lib/py/src/覆盖site-packages/thrift/目录即可

  • 4
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 6
    评论
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值