Hadoop压缩机制的了解

通过一定的算法对数据进行特殊编码,使得数据占用的存储空间比较小,这个过程我们称之为压缩,反之为解压缩
不管哪种压缩工具都需要权衡时间和空间
在大数据领域内还要考虑压缩文件的可分割性
Hadoop支持的压缩工具有:DEFLATE、gzip、bzip以及Snappy
压缩与解压:CompressTest.java

public class CompressTest {
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        //compress("block.txt", "org.apache.hadoop.io.compress.GzipCodec");//解压时注释掉

        //压缩方式:
        //gzip => org.apache.hadoop.io.compress.GzipCodec
        //bzip => org.apache.hadoop.io.compress.BZipCodec
        //snappy => org.apache.hadoop.io.compress.SnappyCodec
        //DEFLATE => org.apache.hadoop.io.compress.DefaultCodec

        decompress(new File("block.txt.gz"));//压缩时注释掉
    }

    private static File compress(String fileName, String compressClassName) throws ClassNotFoundException, IOException {
        Class<?> codecClass = Class.forName(compressClassName);
        Configuration configuration = new Configuration();
        CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, configuration);

        File fileOut = new File(fileName + codec.getDefaultExtension());
        fileOut.delete();

        OutputStream out = new FileOutputStream(fileOut);

        CompressionOutputStream cout = codec.createOutputStream(out);

        File fileIn = new File(fileName);
        InputStream in = new FileInputStream(fileIn);
        IOUtils.copyBytes(in, cout, 4096, false);

        in.close();
        cout.close();

        return fileOut;
    }

    private static void decompress(File file) throws IOException {
        Configuration configuration = new Configuration();
        CompressionCodecFactory factory = new CompressionCodecFactory(configuration);

        CompressionCodec codec = factory.getCodec(new Path(file.getName()));

        if (codec == null) {
            System.out.println("Can not find codec for file " + file);
            return;
        }

        File fileOut = new File(file.getName() + "-.txt");
        InputStream in = codec.createInputStream(new FileInputStream(file));

        OutputStream outputStream = new FileOutputStream(fileOut);
        IOUtils.copyBytes(in, outputStream, 4096, false);

        in.close();
        outputStream.close();
    }
}

在WordCount.java主函数中增加压缩设置:

FileOutputFormat.setCompressOutput(job,true);
FileOutputFormat.setOutputCompressorClass(job, SnappyCodec.class);
 
 
 
 

转载于:https://www.cnblogs.com/jichui/p/10444941.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值