Python 3 使用Hive 总结

最新推荐文章于 2024-09-29 09:33:41 发布

在奋斗的大道

最新推荐文章于 2024-09-29 09:33:41 发布

阅读量1.5k

点赞数 2

分类专栏： Python专栏 python 学习笔记文章标签： hive hadoop 数据仓库

本文链接：https://blog.csdn.net/zhouzhiwengang/article/details/132330997

版权

python 学习笔记同时被 2 个专栏收录

142 篇文章 11 订阅

订阅专栏

Python专栏

35 篇文章 12 订阅

订阅专栏

本文详细介绍了如何配置HiveServer2，包括Thrift服务端口、认证类型等，以及在Python环境下通过PyHive库连接HiveServer2的方法，特别提到了Sasl库的安装问题及其解决方案。

摘要由CSDN通过智能技术生成

启动HiveServer2 服务

HiveServer2 是一种可选的 Hive 内置服务，可以允许远程客户端使用不同编程语言向 Hive 提交请求并返回结果。

Thrift服务配置

假设我们已经成功安装了 Hive，如果没有安装，请参考：Hive 一文读懂。在启动 HiveServer2 之前，我们需要先进行一些配置：

配置项	默认值	说明
hive.server2.transport.mode	binary	HiveServer2 的传输模式，binary或者http
hive.server2.thrift.port	10000	HiveServer2 传输模式设置为 binary 时，Thrift 接口的端口号
hive.server2.thrift.http.port	10001	HiveServer2 传输模式设置为 http 时，Thrift 接口的端口号
hive.server2.thrift.bind.host	localhost	Thrift服务绑定的主机
hive.server2.thrift.min.worker.threads	5	Thrift最小工作线程数
hive.server2.thrift.max.worker.threads	500	Thrift最大工作线程数
hive.server2.authentication	NONE	客户端认证类型，NONE、LDAP、KERBEROS、CUSTOM、PAM、NOSASL
hive.server2.thrift.client.user	anonymous	Thrift 客户端用户名
hive.server2.thrift.client.password	anonymous	Thrift 客户端密码

启动HiveServer2 服务

方式一：$HIVE_HOME/bin/hiveserver2

[root@Hadoop3-master bin]# hiveserver2
2023-08-16 23:14:00: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 0ba8eb07-5f63-43a1-aa4d-61954f6e244f

方式二：$HIVE_HOME/bin/hive --service hiveserver2

检查 HiveServer2 是否启动成功

netstat -nl | grep 10000

启动 hiveserver2 ，访问Hive 管理平台

默认访问地址：http://192.168.43.11:10002/

效果截图：

Python 连接Hive

依赖的第三方库包

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

温馨提示：首先是pyhive的安装：pyhive这个包依赖于sasl，thrift,thrift-sasl这三个包。

安装Sasl 库表包遇到的问题

造成错误的原因是:

 saslwrapper.cpp
      C:\Users\zzg\AppData\Local\Temp\pip-install-1vw7hyr4\sasl_05859569d9c14648abbe3a8901ed3627\sasl\saslwrapper.h(22): fatal error C1083: 无法打开包括文件: “sasl/sasl.h”: No such file or directory

saslwrapper.cpp 文件中无法找到sasl/sasl.h 头文件。

Google 和百度的解决办法

通过加利福利亚大学的的镜像地址: https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl ，下载sasl.whl 文件。情况：现在的情况是网站已经被关闭。

通过清华大学的镜像地址：https://pypi.tuna.tsinghua.edu.cn/simple/sasl/ ，下载sasl.whl 文件。情况：没有Python-3.10 且支持windows 64 架构的库包。

温馨提示：清华的镜像地址提供关于sasl.whl 内容主要包含:

支持python-3.5.0 至python-3.9.0 版本且系统架构仅支持Linux 架构。
提供sasl第三方库源码：支持0.1.1 至0.3.1

编译Sasl-0.3.1 源码，生成Sasl.whl 文件

通过清华镜像下载Sasl 源码，解压后的效果截图：

切换至Sasl 源码，执行指令:python setup.py bdist_wheel

源码编译的错误与pip 安装sasl 库一样。

借鉴其他安装Sasl成功

环境说明：

python版本为python 3.10

cp310:表示为python的版本，为python 3 10的

win_amd64:表示为驱动为windows 64位的驱动

对应sasl.whl 包 = sasl-0.3.1-cp310-cp310-win_amd64.whl

执行如下指令：

pip install  sasl-0.3.1-cp310-cp310-win_amd64.whl

安装thrift

pip install thrift

安装thrift_sasl

pip install thrift_sasl

安装pyHive

pip install pyhive

Python 连接Hive 代码

from pyhive import hive
 
# 读取数据
def select_pyhive(sql):
    # 创建hive连接
    conn = hive.Connection(host='192.168.43.11', port=10000, username='默认', database='user')
    cur = conn.cursor()
    try:
        #c = cur.fetchall()
        df = pd.read_sql(sql, conn)
        return df
    finally:
        if conn:
            conn.close()
 
sql = "show databases"
df = select_pyhive(sql)