lzo java_gileyang

最新推荐文章于 2022-03-15 18:31:52 发布

sxtybzwm

最新推荐文章于 2022-03-15 18:31:52 发布

阅读量146

点赞数

文章标签： lzo java

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35617554/article/details/114457772

版权

这篇博客介绍了如何将C语言的LZO压缩源码转换为Java实现，实现了超过500Mb/s的压缩速度和815Mb/s的解压缩速度。文章讨论了性能测试、使用示例以及与Hadoop BlockCompressionStream的兼容性问题，并指出了Hadoop在处理LZO压缩时存在的效率和格式问题。此外，还提供了JavaDoc API文档链接。

摘要由CSDN通过智能技术生成

LZO for Java

Introduction

There is no version of LZO in pure Java. The obvious solution is to

take the C source code, and feed it to the Java compiler, modifying

the Java compiler as necessary to make it compile.

This package is an implementation of that obvious solution, for which

I can only apologise to the world.

It turns out, however, that the compression performance on a single

2.4GHz laptop CPU is in excess of 500Mb/sec, and decompression runs

at 815Mb/sec, which seems to be more than adequate. Run

PerformanceTest on an appropriate file to reproduce these figures.

Example

Compression:

OutputStream out = ...;

LzoAlgorithm algorithm = LzoAlgorithm.LZO1X;

LzoCompressor compressor = LzoLibrary.getInstance().newCompressor(algorithm, null);

LzoOutputStream stream = new LzoOutputStream(out, compressor, 256);

stream.write(...);

Decompression:

InputStream in = ...;

LzoAlgorithm algorithm = LzoAlgorithm.LZO1X;

LzoDecompressor decompressor = LzoLibrary.getInstance().newDecompressor(algorithm, null);

LzoInputStream stream = new LzoInputStream(in, decompressor);

stream.read(...);

Documentation

The JavaDoc API

is available.

Hadoop Notes

Notes on BlockCompressionStream, as of Hadoop 0.21.x:

If you write 1 byte, then a large block, BlockCompressorStream will

flush the single-byte block before compressing the large block. This

is inefficient.

If you write a large block to a fresh stream, BlockCompressorStream

will flush existing data, which will write a zero uncompressed

length to the file, but follow it with no blocks, thus breaking the

ulen-clen-data format. This is wrong. There is no contract for the

finished() method to avoid this, since it must return false at the

top of write(), then must (with no other mutator calls) return true

in BlockCompressorStream.finish() in order to avoid the empty block;

having returned true there, compress() must be able to return a

nonempty block, even though we have no data. This is wrong.

Large blocks are written (ulen (clen data)) not (ulen clen data)

due to the loop in compress(). This is not the same as the format for

lzop, thus a data file written using LzopCodec cannot be read by lzop.

See lzop-1.03/src/p_lzo.c method lzo_compress, which contains a

single very simple loop, which is how Hadoop's BlockCompressorStream

should be written. This is both inefficient and wrong.

If the LZO compressor needs to use its holdover field (or,

equivalently in other people's code, setInputFromSavedData()),

then the ulen-clen-data format is broken because getBytesRead()

MUST return the full number of bytes passed to setInput(), not

just the number of bytes actually compressed so far; then if there

is holdover data, there is nowhere for it to go but into the

returned data from a second call to compress(), at which point the

API has forced us to break ulen-clen-data, as per lzop's file

format. This is wrong, and badly designed.

The number of uncompressed bytes is written to the stream in lzop.

There is therefore no excuse for a "Buffer too small" error in

decompression. However, this value is NOT used to resize the

decompressor's output buffer, and so the error occurs. One cannot,

as a rule, know the size of output buffer required to decompress a

given file, so Hadoop must be configured by trial and error. This

is badly designed, and harder to use.

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lzo java_gileyang

LZO for JavaIntroductionThere is no version of LZO in pure Java. The obvious solution is totake the C source code, and feed it to the Java compiler, modifyingthe Java compiler as necessary to make it ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。