hadoopsnappy解压_解密Hadoop Snappy文件

在尝试使用Python-snappy库解压缩从HDFS复制的Snappy文件时遇到问题,报错'invalid input'。通过对比hadoop fs -text命令的成功解压,发现可能是由于Hadoop使用的Snappy版本与Python代码中版本不兼容。最终解决方案是使用python-snappy 0.5.2版本,该版本增加了对Hadoop帧格式的支持,从而能够正确解压缩Hadoop的Snappy文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

bd96500e110b49cbb3cd949968f18be7.png

So I'm having some issues decrypting a snappy file from HDFS. If I use hadoop fs -text I am able to uncompress and output the file just file. However if I use hadoop fs -copyToLocal and try to uncompress the file with python-snappy I get

snappy.UncompressError: Error while decompressing: invalid input

My python program is very simple and looks like this:

import snappy

with open (snappy_file, "r") as input_file:

data = input_file.read()

uncompressed = snappy.uncompress(data)

print uncompressed

This fails miserably for me. So I tried another text, I took the output from hadoop fs -text and compressed it using the python-snappy library. I then outputted this to a file. I was able to then read this file in and uncompress it just fine.

AFAIK snappy is backwards compatible between version. My python code is using the latest snappy version and I'm guessing hadoop is using an older snappy version. Could this be a problem? Or is there something else I am missing here?

解决方案

Okay well I figured it out. Turns out that what I was using was the raw mode decompress on a file that was compressed using hadoop's framing format. Even when I tried the StreamDecompressor in 0.5.1 it still failed due to a framing error. python-snappy 0.5.1 defaults to the new snappy framing format and thus can't decompress the hadoop snappy files.

Turns out that the master version, 0.5.2, has added support for the hadoop framing format. Once I built this and imported it I was able to decompress the file easily:

with open (snappy_file, "r") as input_file:

data = input_file.read()

decompressor = snappy.hadoop_snappy.StreamDecompressor()

uncompressed = decompressor.decompress(data)

Now the only issue is that this isn't technically a pip version yet, so I guess I'll have to wait or just use the build from source.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值