linux 连接hive数据库失败,windows和linux连接hive问题

本文详细介绍了在CentOS和Windows环境下,使用pyhive和impala.dbapi连接Hive数据库时遇到的问题及解决办法,包括安装依赖、配置参数、错误排查等,旨在帮助用户成功建立连接。

本文讲述下远程连接Hive遇到的一些问题

一、CentOS连接Hive

尝试使用CentOS 7远程连接Hive,首先安装好需要的相关包pip install pyhive

pip install thrift

yum install cyrus-sasl-devel.x86_64

pip install sasl

pip install thrift-sasl==0.2.1

1.连接Hive$python

>>> from pyhive import hive

>>> conn = hive.Connection(host='192.168.12.5', port=10000, username='root', database='behavior_labels')

>>> cur=conn.cursor()

>>> cur.execute('SHOW TABLES')

>>> cur.fetchall()

另外:当使用create_engine连接hive时需要安装pyhive,如果报错ImportError: cannot import name TFrozenDict时,升级thrift版本。

目前pandas使用to_sql存入数据到hive只能存入一行就失败。from sqlalchemy import create_engine

engine = create_engine('hive://root@192.168.12.67:10000/item_recommand')

data.to_sql('user', con=engine,index=False, if_exists='append')#只能存入一行数据

from impala.dbapi import connect

from impala.util import as_pandas

conn = connect(host='192.168.12.67', port=10000, auth_mechanism='PLAIN', user='root', password='20150228', database='item_recommand')

cur=conn.cursor()

cur.execute("insert into user values('4','1',1,'5','2'),('5','1',1,'5','2')")#hive版本1.1,插入速度很慢

2.插入数据

读取数据还OK,但是集群中用pyspark插入数据的话,速度很慢,而且容易锁表,还是在集群中生成临时表,然后插入hive比较合适from pyspark import SparkContext,SparkConf

from pyspark.sql import HiveContext

conf = SparkConf().setAppName(u'recommend')#.setMaster("local[*]")

#sparkContext = SparkContext(conf=conf)

sc= SparkContext.getOrCreate()

hive_context = HiveContext(sc)

hive_context.sql('use recommand')

df=hive_context.createDataFrame([(1,2,3),(4,5,6)],['vipid','recommand','rank'])

df.registerTempTable("df")

hive_context.sql('insert into table itemcf select * from df')

二、Windows连接Hive

1.使用pyhive连接hive

(连接未成功,可跳过)

安装好连接hive所需的包pip install pyhive

pip install thrift

pip install sasl #此步需要先安装visualcppbuildtools_full.exe,不成功的话下载https://www.lfd.uci.edu/~gohlke/pythonlibs/的包

pip install thrift_saslC:\Users\Administrator> python

>>> from pyhive import hive

>>> conn = hive.Connection(host='192.168.12.5', port=10000, username='root', database='behavior_labels')

报错:thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'

经过在GitHub和Stack Overflow查看资料,推测是sasl不支持windows版本,则使用impala连接hive

2.使用impala连接hive

1)当python版本是2.7时pip2 install impyla

pip2 install thrift==0.9.3

pip2 install thrift_sasl #失败的话先安装相关依赖包sudo yum install cyrus-sasl-devel>>> from impala.dbapi import connect

>>> conn = connect(host='192.168.12.5', port=10000, auth_mechanism='PLAIN', user='root', password='*', database='behavior_labels')

>>> cur=conn.cursor()

>>> cur.execute('SHOW TABLES')

>>> cur.fetchall()

显示hive中的table表即成功

2)python版本是3.6时

安装前需把相关的包卸载干净,然后重新安装对应的版本pip3 uninstall sasl #运行时报错module 'sasl' has no attribute 'Client',说明该包没有删除干净,需要手动删除文件

pip3 install impyla

pip3 install pure-sasl

pip3 install thrift_sasl==0.2.1 --no-deps>>> from impala.dbapi import connect

运行报错:thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c' ,需要在文件"C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\thriftpy\parser\parser.py"中第488行代码if url_scheme == '':

with open(path) as fh:

data = fh.read()

改为if len(url_scheme) <=1:

with open(path) as fh:

data = fh.read()>>> conn = connect(host='192.168.12.5', port=10000, auth_mechanism='PLAIN', user='root', password='5606603', database='behavior_labels')

报错:TypeError: can't concat str to bytes    需要在File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-p

ackages\thrift_sasl\__init__.py"第94行代码def _send_message(self, status, body):

header = struct.pack(">BI", status, len(body))

self._trans.write(header + body)

self._trans.flush()

改为def _send_message(self, status, body):

header = struct.pack(">BI", status, len(body))

if(type(body) is str):

body = body.encode()

self._trans.write(header + body)

self._trans.flush()

到此基本就没问题了,打开python3,开始连接hive>>> from impala.dbapi import connect

>>> conn = connect(host='192.168.12.5', port=10000, auth_mechanism='PLAIN', user='root', password='*', database='behavior_labels')

>>> cur=conn.cursor()

>>> cur.execute('SHOW TABLES')

>>> cur.fetchall()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值