lzo java_gileyang

这篇博客介绍了如何将C语言的LZO压缩源码转换为Java实现,实现了超过500Mb/s的压缩速度和815Mb/s的解压缩速度。文章讨论了性能测试、使用示例以及与Hadoop BlockCompressionStream的兼容性问题,并指出了Hadoop在处理LZO压缩时存在的效率和格式问题。此外,还提供了JavaDoc API文档链接。
摘要由CSDN通过智能技术生成

LZO for Java

Introduction

There is no version of LZO in pure Java. The obvious solution is to

take the C source code, and feed it to the Java compiler, modifying

the Java compiler as necessary to make it compile.

This package is an implementation of that obvious solution, for which

I can only apologise to the world.

It turns out, however, that the compression performance on a single

2.4GHz laptop CPU is in excess of 500Mb/sec, and decompression runs

at 815Mb/sec, which seems to be more than adequate. Run

PerformanceTest on an appropriate file to reproduce these figures.

Example

Compression:

OutputStream out = ...;

LzoAlgorithm algorithm = LzoAlgorithm.LZO1X;

LzoCompressor compressor = LzoLibrary.getInstance().newCompressor(algorithm, null);

LzoOutputStream stream = new LzoOutputStream(out, compressor, 256);

stream.write(...);

Decompression:

InputStream in = ...;

LzoAlgorithm algorithm = LzoAlgorithm.LZO1X;

LzoDecompressor decompressor = LzoLibrary.getInstance().newDecompressor(algorithm, null);

LzoInputStream stream = new LzoInputStream(in, decompressor);

stream.read(...);

Documentation

The JavaDoc API

is available.

Hadoop Notes

Notes on BlockCompressionStream, as of Hadoop 0.21.x:

If you write 1 byte, then a large block, BlockCompressorStream will

flush the single-byte block before compressing the large block. This

is inefficient.

If you write a large block to a fresh stream, BlockCompressorStream

will flush existing data, which will write a zero uncompressed

length to the file, but follow it with no blocks, thus breaking the

ulen-clen-data format. This is wrong. There is no contract for the

finished() method to avoid this, since it must return false at the

top of write(), then must (with no other mutator calls) return true

in BlockCompressorStream.finish() in order to avoid the empty block;

having returned true there, compress() must be able to return a

nonempty block, even though we have no data. This is wrong.

Large blocks are written (ulen (clen data)) not (ulen clen data)

due to the loop in compress(). This is not the same as the format for

lzop, thus a data file written using LzopCodec cannot be read by lzop.

See lzop-1.03/src/p_lzo.c method lzo_compress, which contains a

single very simple loop, which is how Hadoop's BlockCompressorStream

should be written. This is both inefficient and wrong.

If the LZO compressor needs to use its holdover field (or,

equivalently in other people's code, setInputFromSavedData()),

then the ulen-clen-data format is broken because getBytesRead()

MUST return the full number of bytes passed to setInput(), not

just the number of bytes actually compressed so far; then if there

is holdover data, there is nowhere for it to go but into the

returned data from a second call to compress(), at which point the

API has forced us to break ulen-clen-data, as per lzop's file

format. This is wrong, and badly designed.

The number of uncompressed bytes is written to the stream in lzop.

There is therefore no excuse for a "Buffer too small" error in

decompression. However, this value is NOT used to resize the

decompressor's output buffer, and so the error occurs. One cannot,

as a rule, know the size of output buffer required to decompress a

given file, so Hadoop must be configured by trial and error. This

is badly designed, and harder to use.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
你可以使用 Java LZO 库来进行多文件压缩和解压缩。下面是一个示例代码: ```java import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream; import org.apache.commons.compress.compressors.lzma.LZMACompressorOutputStream; public class LzoUtil { public static void compress(List<File> files, String outputFileName) throws IOException { LZMACompressorOutputStream out = new LZMACompressorOutputStream(new FileOutputStream(outputFileName)); for (File file : files) { FileInputStream in = new FileInputStream(file); byte[] buffer = new byte[1024]; int len; while ((len = in.read(buffer)) > 0) { out.write(buffer, 0, len); } in.close(); } out.close(); } public static void decompress(String inputFileName, String outputDirName) throws IOException { FileInputStream in = new FileInputStream(inputFileName); LZMACompressorInputStream lzmaIn = new LZMACompressorInputStream(in); byte[] buffer = new byte[1024]; int len; int fileIndex = 0; while ((len = lzmaIn.read(buffer)) > 0) { File outputFile = new File(outputDirName, "file" + (++fileIndex)); FileOutputStream out = new FileOutputStream(outputFile); out.write(buffer, 0, len); out.close(); } lzmaIn.close(); } public static void main(String[] args) throws IOException { // 压缩文件 List<File> files = new ArrayList<>(); files.add(new File("file1.txt")); files.add(new File("file2.txt")); compress(files, "output.lzo"); // 解压文件 decompress("output.lzo", "output"); } } ``` 在上面的示例代码中,`compress` 方法接收一个文件列表和输出文件名作为参数,将文件列表中的所有文件压缩成一个 LZO 文件。`decompress` 方法接受输入文件名和输出目录名作为参数,将输入文件解压缩成多个文件,并将它们保存在指定的输出目录中。 你可以根据自己的需要修改这些方法来实现你的需求。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值