Hadoop下Python调用hql文件的方法

AKA_PROGRAMER

已于 2023-03-16 11:17:40 修改

阅读量323

点赞数

文章标签： python 开发语言大数据 hadoop

于 2023-02-21 10:08:01 首次发布

本文链接：https://blog.csdn.net/dieson1027/article/details/129136706

版权

文章介绍了在Python中执行Hive查询的两种方法：方案一是使用pyhive库直接连接并执行HQL，通过命令行参数传递日期；方案二是调用beeline命令行工具，将处理后的HQL写入临时文件执行。注意点包括文件格式、编码兼容性和时间参数的处理。

摘要由CSDN通过智能技术生成

（1）方案1，使用pyhive执行
安装需求包
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

#hql文件里使用${etl_date}作为日期参数
#hql文件使用;;作为sql段的分割
import sys
from pyhive import hive
#连接hive 注意端口默认为10000，若使用了高可用请用高可用对应ip和端口
conn = hive.Connection(host='xx.xx.xx.xx', port=10000, username='hive', database='default')
cursor = conn.cursor()
filepath = sys.argv[1]		#传参文件路径
etl_date = sys.argv[1]		#传参跑批日期yyyymmdd
sqls = []
with open(filepath,'r') as fr:
	content = fr.read()
#sql文件内容不同代码段用;;分割，不使用;分割是为了避免切割错误
sqls = content.replace('${etl_date}',etl_date).split(';;')
for sql in sqls:
	cursor.execute(sql)

（2）方案2，调用beeline执行

#hql文件里使用${etl_date}作为日期参数
#beeline部分参数因人而异
import sys
import os
from datetime import datetime
filepath = sys.argv[1]		#传参文件路径
etl_date = sys.argv[1]		#传参跑批日期yyyymmdd
with open(filepath,'r') as fr:
	content = fr.read()
now = datetime.now().strftime('%Y%m%d,%H%M%S')
#修改日期参数并落成临时文件
tmpfile = filepath+'.tmp'+now	
with open(tmpfile,'w') as fw:
	fw.write(content.replace('${etl_date}',etl_date))
beeline = "beeline -u jdbc:hive2://127.0.0.1:10000 -f "+tmpfile
os.system(beeline)
os.remove(tmpfile)