python数据导入hive,使用Python将多行插入到Hive表中

本文介绍了如何利用Hive的Streaming API通过Python脚本高效地将大量数据插入到Hive表中,避免了标准INSERT INTO操作的性能问题。通过示例展示了如何编写Python脚本处理输入数据并将其转换为Hive可读格式,以及在Linux和Windows HDInsight上执行此操作的方法。
摘要由CSDN通过智能技术生成

Hive is a data warehouse designed for querying and aggregating large datasets that reside on HDFS.

The standard INSERT INTO syntax performs poorly because:

Each statement required a Map/Reduce process to be executed.

Each statement will result in a new file being added to HDFS - over time this will lead to very poor performance when reading from the table.

With that said, there is now a Streaming API for Hive / HCatalog, as detailed here.

I am faced with the need to insert data at velocity into Hive, using Python. I am aware of the pyhive and pyhs2 libraries, but neither of them appears to make use of the Streaming API.

Has anyone successfully managed to get Python to insert many rows into Hive using the Streaming API, and how was this done?

I look forward to your insights!

解决方案

Hive user can stream table through script to tra

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值