本文讲述下远程连接Hive遇到的一些问题
一、CentOS连接Hive
尝试使用CentOS 7远程连接Hive,首先安装好需要的相关包
pip install pyhive
pip install thrift
yum install cyrus-sasl-devel.x86_64
pip install sasl
pip install thrift_sasl
连接Hive
$python
>>> from pyhive import hive
>>> conn = hive.Connection(host='192.168.12.5', port=10000, username='root', database='behavior_labels')
>>> cur=conn.cursor()
>>> cur.execute('SHOW TABLES')
>>> cur.fetchall()
二、Windows连接Hive
1.使用pyhive连接hive(连接未成功,可跳过)
安装好连接hive所需的包
pip install pyhive
pip install thrift
pip install sasl #此步需要先安装visualcppbuildtools_full.exe,不成功的话下载https://www.lfd.uci.edu/~gohlke/pythonlibs/的包
pip install thrift_sasl
C:\Users\Administrator> python
>>> from pyhive import hive
>>> conn = hive.Connection(host='192.168.12.5', port=10000, username='root', database='behavior_labels')
报错:thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
经过在GitHub和Stack Overflow查看资料,推测是sasl不支持windows版本,则使用impala连接hive
2.使用impala连接hive
1)当 python 版本是2.7时
pip2 install impyla
pip2 install thrift_sasl
>>> from impala.dbapi import connect
>>> conn = connect(host='192.168.12.5', port=10000, auth_mechanism='PLAIN', user='root', password='*', database='behavior_labels')
>>> cur=conn.cursor()
>>> cur.execute('SHOW TABLES')
>>> cur.fetchall()
显示hive中的table表即成功
2)python版本是3.6时
安装前需把相关的包卸载干净,然后重新安装对应的版本
pip3 uninstall sasl #运行时报错module 'sasl' has no attribute 'Client',说明该包没有删除干净,需要手动删除文件
pip3 install impyla
pip3 install pure-sasl
pip3 install thrift_sasl==0.2.1 --no-deps
>>> from impala.dbapi import connect
运行报错:thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c' ,需要在文件"C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\thriftpy\parser\parser.py"中第488行代码
if url_scheme == '':
with open(path) as fh:
data = fh.read()
改为
if len(url_scheme) <=1:
with open(path) as fh:
data = fh.read()
>>> conn = connect(host='192.168.12.5', port=10000, auth_mechanism='PLAIN', user='root', password='5606603', database='behavior_labels')
报错:TypeError: can't concat str to bytes 需要在File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\thrift_sasl\__init__.py"第94行代码
def _send_message(self, status, body):
header = struct.pack(">BI", status, len(body))
self._trans.write(header + body)
self._trans.flush()
改为
def _send_message(self, status, body):
header = struct.pack(">BI", status, len(body))
if(type(body) is str):
body = body.encode()
self._trans.write(header + body)
self._trans.flush()
到此基本就没问题了,打开python3,开始连接hive
>>> from impala.dbapi import connect
>>> conn = connect(host='192.168.12.5', port=10000, auth_mechanism='PLAIN', user='root', password='*', database='behavior_labels')
>>> cur=conn.cursor()
>>> cur.execute('SHOW TABLES')
>>> cur.fetchall()