实现Hadoop在Map与Reduce阶段压缩（手写压缩与解压缩代码）

最新推荐文章于 2023-05-29 21:27:24 发布

IT小强哥

最新推荐文章于 2023-05-29 21:27:24 发布

阅读量251

点赞数

分类专栏： Hadoop 文章标签： hadoop

本文链接：https://blog.csdn.net/jackfeng86/article/details/117753971

版权

Hadoop 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

Hadoop在Map与Reduce阶段都是通过配置文件进行实现的，具体见下文。
手写压缩文件与解压缩问题有代码演示，请客官笑纳。

1. Map输出进行压缩

// 设置在map输出阶段压缩
        conf.set("mapreduce.map.outputt.compress", "true");

// 设置解压缩编码器
        conf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.DefaultCodec");

在这里插入图片描述

2. Reduce输出进行压缩

// 设置在reduce输出阶段压缩
        conf.set("mapreduce.output.fileoutputformat.compress", "true");
        // 设置解压缩编码器
        conf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.DefaultCodec");

在这里插入图片描述

3.手写压缩文件

/**
     * 手写压缩方法
     * 分析：找一个压缩器在输出流的时候进行输出
     */
    @Test
    public void testCompress() throws IOException, ClassNotFoundException {
        // 输入路径
        String srcPath = "D:\\hadoop_in\\jianai\\ja.txt";
        // 输出路径
        String destPath = "D:\\hadoop_in\\jianai\\ja";
        // 输入流
        FileInputStream in = new FileInputStream(new File(srcPath));

        // 获取压缩编码器
        Class<?> codecClass = Class.forName("org.apache.hadoop.io.compress.DefaultCodec");
        Configuration conf = new Configuration();
        CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf);

        // 输出流
        FileOutputStream out = new FileOutputStream(new File(destPath+codec.getDefaultExtension()));
        // 通过编解码器进行包装输出流
        CompressionOutputStream outputStream = codec.createOutputStream(out);

        // 进行输出
        IOUtils.copyBytes(in, outputStream, conf);

        // 关闭流
        IOUtils.closeStream(in);
        IOUtils.closeStream(outputStream);
    }

在这里插入图片描述

4.手写解压缩程序

/**
     * 手写解压缩
     * 分析：找个解压缩器在流输入的时候进行解压缩，并正常流输出即可
     */
    @Test
    public void compressCodes() throws IOException {
        // 定义输入路径
        String srcPath = "D:\\hadoop_in\\jianai\\ja.deflate";
        // 定义输出路径
        String destPath = "D:\\hadoop_in\\jianai\\ja.txt";
        // 定义输入流
        FileInputStream in = new FileInputStream(new File(srcPath));
        Configuration conf = new Configuration();
        // 获取解码器
        CompressionCodec codec =
                new CompressionCodecFactory(conf).getCodec(new Path(srcPath));
        // 对输入流进行解码器包装
        CompressionInputStream cin = codec.createInputStream(in);
        // 定义输出流
        FileOutputStream out = new FileOutputStream(new File(destPath));
        // 输出流信息
        IOUtils.copyBytes(cin, out, conf);
        // 关闭流
        IOUtils.closeStream(cin);
        IOUtils.closeStream(out);

    }