Python 连接 Hive & Impala

lq1205990816

已于 2023-09-19 21:00:11 修改

阅读量475

点赞数

文章标签： python 开发语言 hive 数据仓库

于 2023-09-19 20:51:55 首次发布

本文链接：https://blog.csdn.net/lq1205990816/article/details/133044712

版权

基础环境

OS：win10

Python：3.9

IDE工具：pycharm

Hive：2.1.1+cdh6.1.1

Impala：3.1.0+cdh6.1.1

Hadoop：3.0.0+cdh6.1.1

pip command

# 安装本地文件

pip install sasl-0.3.1-cp39-cp39-win_amd64.whl

#默认源安装

pip install package_name==version

#指定从清华源安装

pip install thrift -i https://pypi.tuna.tsinghua.edu.cn/simple

#指定从清华源安装指定版本

pip install thrift_sasl==0.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

#卸载指定包

pip uninstall package_name

#查询已经安装的版本

pip show package_name

同时支持Hive与Impala版本

#需要安装的依赖

pip install pyhive==0.6.5

pip install SQLAlchemy==1.3.24

pip install pandas==1.2.5

pip install thrift_sasl==0.4.2

pip install impyla==0.16.2

经测试：

thrift_sasl-0.2.1 #仅支持impala，连接hive报错
thrift_sasl-0.3.0 #仅支持hive，连接impala报错

Python连接Hive

from pyhive import hive

# 建立数据库连接
conn = hive.Connection(host='IP', port=10000, database="dmp",username='hiveTest')
cursor = conn.cursor()
query_sql = "show tables"
cursor.execute(query_sql)
result = cursor.fetchall()
cursor.close()
conn.close()
for res in result:
    print(res)

Python连接Impala（也支持连接Hive）

from impala.dbapi import connect

# 建立数据库连接
#connect impala
conn = connect(host='IP', port=21050, auth_mechanism='NOSASL')
# connect hive
# conn = connect(host='IP', port=10000,database='default',user="dmp", auth_mechanism='PLAIN')
cursor = conn.cursor()
cursor.execute('show tables')
result = cursor.fetchall()
conn.close()
for res in result:
    print(res)

Python基于Sqlalchemy与pandas访问Hive

from sqlalchemy import create_engine
import pandas as pd

# 读取数据 : jdbc:hive2://ip:10000  dmp
hive_engine = create_engine('hive://hiveTest@ip:10000/database?auth=NONE')
data = pd.read_sql("show tables",con=hive_engine)

print(data)