山东大学软件工程应用与实践——Pig代码分析（九）

Tdqiu

于 2021-12-09 14:27:15 发布

阅读量207

点赞数

文章标签： pig

本文链接：https://blog.csdn.net/weixin_54263893/article/details/121821219

版权

2021SC@SDUSC

概述
本次继续分析pig作为hadoop的轻量级脚本语言操作hadoop的executionengine包下的MapReduceLayer类的代码

此类用于让 POStore 通过输出收集器/记录编写器写入 DFS。它设置修改后的作业配置，以强制写入主输出目录的特定子目录。这样做是为了能在同一作业中使用多个输出目录。

MapReducePOStoreImpl方法
获取配置的副本，以便对以下配置的更改（如设置输出位置）不会影响调用方的副本
制作一个上下文的副本以在此处使用 - 因为在同一任务（映射或减少）中，我们可以有多个存储，我们应该制作此副本，以便相同的上下文不会被不同的存储覆盖。

public MapReducePOStoreImpl(TaskInputOutputContext<?,?,?,?> context) {
        Configuration outputConf = new Configuration(context.getConfiguration());
        reporter = PigStatusReporter.getInstance();
        reporter.setContext(new MRTaskContext(context));
        this.context = HadoopShims.createTaskAttemptContext(outputConf,
                context.getTaskAttemptID());
    }

StoreFuncInterface接口
在 storeFunc 上调用 setStoreLocation，为其提供 Job。这将导致
storeFunc 将输出位置存储在作业的配置中。PigOutFormat.setLocation方法会将此修改后的配置合并到我们拥有的上下文的配置中。

public StoreFuncInterface createStoreFunc(POStore store)
            throws IOException {

        StoreFuncInterface storeFunc = store.getStoreFunc();

        PigOutputFormat.setLocation(context, store);
        OutputFormat<?,?> outputFormat = storeFunc.getOutputFormat();
        try {
            writer = outputFormat.getRecordWriter(context);
        } catch (InterruptedException e) {
            throw new IOException(e);
        }

        storeFunc.prepareToWrite(writer);

        return storeFunc;
    }