python调用hive与java调用区别,如何通过Python访问Hive?

When I add this to /etc/profile:

export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py

I can then do the imports as listed in the link, with the exception of from hive import ThriftHive which actually need to be:

from hive_service import ThriftHive

Next the port in the example was 10000, which when I tried caused the program to hang. The default Hive Thrift port is 9083, which stopped the hanging.

So I set it up like so:

from thrift import Thrift

from thrift.transport import TSocket

from thrift.transport import TTransport

from thrift.protocol import TBinaryProtocol

try:

transport = TSocket.TSocket('', 9083)

transport = TTransport.TBufferedTransport(transport)

protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = ThriftHive.Client(protocol)

transport.open()

client.execute("CREATE TABLE test(c1 int)")

transport.close()

except Thrift.TException, tx:

print '%s' % (tx.message)

I received the following error:

Traceback (most recent call last):

File "", line 1, in

File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 68, in execute

self.recv_execute()

File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 84, in recv_execute

raise x

thrift.Thrift.TApplicationException: Invalid method name: 'execute'

But inspecting the ThriftHive.py file reveals the method execute within the Client class.

How may I use Python to access Hive?

解决方案

I believe the easiest way is to use PyHive.

To install you'll need these libraries:

pip install sasl

pip install thrift

pip install thrift-sasl

pip install PyHive

Please note that although you install the library as PyHive, you import the module as pyhive, all lower-case.

If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. For Windows there are some options on GNU.org, you can download a binary installer. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install in Terminal)

After installation, you can connect to Hive like this:

from pyhive import hive

conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")

Now that you have the hive connection, you have options how to use it. You can just straight-up query:

cursor = conn.cursor()

cursor.execute("SELECT cool_stuff FROM hive_table")

for result in cursor.fetchall():

use_result(result)

...or to use the connection to make a Pandas dataframe:

import pandas as pd

df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值