目前Pyhive和impyla不兼容,同一个python不能同时用这两个library
建议连impyla
连Impala
connect函数的源代码: https://github.com/cloudera/impyla/blob/master/impala/dbapi.py
示例:
from impala.dbapi import connect
mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='PLAIN')
mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='GSSAPI')
如果VM上有kerberos权限 那么可以用 auth_mechanism='GSSAPI' 或 auth_mechanism='PLAIN'
如果美誉kerberos权限, 请用auth_mechanism='PLAIN'.
另附一句命令行的连接impalad的方法:
#kinit first
impala-shell -u username -k
https://github.com/cloudera/thrift_sasl/releases
Python3 连接impala正解
# installation for python 3.5.1 , 3.7
# python 3.9 not support, error happened.
sudo pip3 install impyla
sudo pip3 install thrift_sasl
pip3 install pure-sasl==0.5.1
pip3 install thrift-sasl==0.2.1 --no-deps
pip3 install thrift==0.9.3
pip3 install impyla==0.14.1
pip3 install bitarray==0.8.3
pip3 install thriftpy==0.3.9
# TypeError: can't concat str to bytes
vi /opt/python3.5/lib/python3.5/site-packages/thrift_sasl/__init__.py
# 定位到错误的最后一条,在init.py第94行 (注意代码的缩进)
header = struct.pack(">BI", status, len(body))
self._trans.write(header + body)
更改为:
header = struct.pack(">BI", status, len(body))
if(type(body) is str):
body = body.encode()
self._trans.write(header + body)
impyla 0.16.2
thrift 0.13.0
thrift-sasl 0.4.2
thriftpy 0.3.9
thriftpy2 0.4.11
pure-sasl 0.6.2
问题1.
from thriftpy.transport import TTransportException, TTransportBase, readall
ImportError: cannot import name 'TTransportException'
问题2.
'TSocket' object has no attribute 'isOpen bug: https://github.com/cloudera/impyla/issues/268
'TSaslClientTransport' object has no attribute 'readAll': https://github.com/dropbox/PyHive/issues/151
解决方案:
https://github.com/dropbox/PyHive/commit/5322d8f1420b033ba7446449b5cca2cbf9f6fbc4
pip3 install git+https://github.com/cloudera/thrift_sasl
同时使用impala和pyHive请注意import顺序
连Hive
Python library 版本:
thrift 0.11.0
thrift-sasl 0.3.0 (使用非release版本, 而是用上面的URL来安装)
thriftpy 0.3.9
PyHive 0.6.1
kerberos + LDAP 的权限体系
from pyhive import hive
mcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,username='someone',password='password',auth='LDAP')
cs = mcon.cursor()
cs.execute('show database')
print(cs.fetchall())
cs.close()
mcon.close()
Kerberos权限体系
from pyhive import hive
import pandas as pd
hcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,auth ='KERBEROS',kerberos_service_name='hive')
hdata = pd.read_sql('show databases',hcon)
print(hdata)
python2 装 impyla,准备工作:
sudo pip install --upgrade setuptools
sudo yum install -y gcc libffi-devel python-devel openssl-devel gcc-c++
sudo yum install python-devel openldap-devel
python2 装 hive
在终端里输入下列命令
pip install pyhive[hive]
注意这里要加上[hive]后缀,否则有些关联的包装不上,会导致报错,我就遇到如下报错信息:
ImportError: cannot import name TFrozenDict 错误
impyla 0.14.2.2对 thrift 库的要求是<=0.9.3, 而pyhive 0.6.1不兼容thrift 0.9.3 ,pyhive用的是0.13.0
impyla 0.14.2.2 has requirement thrift<=0.9.3, but you'll have thrift 0.13.0 which is incompatible.