山东大学软件工程应用与实践——Pig代码分析（四）

最新推荐文章于 2022-02-26 15:11:22 发布

Tdqiu

最新推荐文章于 2022-02-26 15:11:22 发布

阅读量123

点赞数

文章标签： pig

本文链接：https://blog.csdn.net/weixin_54263893/article/details/121250513

版权

2021SC@SDUSC

概述

本次继续分析pig作为hadoop的轻量级脚本语言操作hadoop的executionengine包下的InputSizeReducerEstimator类的代码

InputSizeReducerEstimator类

这是一个根据输入大小估计化简器数量的类。

estimateNumberOfReducers方法

决定要使用的化简器的数量

public int estimateNumberOfReducers(Job job, MapReduceOper mapReduceOper) throws IOException {
        Configuration conf = job.getConfiguration();
        long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM, DEFAULT_BYTES_PER_REDUCER);
        int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM, DEFAULT_MAX_REDUCER_COUNT_PARAM);
        List<POLoad> poLoads = PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
        long totalInputFileSize = getTotalInputFileSize(conf, poLoads, job);
        log.info("BytesPerReducer=" + bytesPerReducer + " maxReducers="
            + maxReducers + " totalInputFileSize=" + totalInputFileSize);
        if (totalInputFileSize == -1) { return -1; }
        int reducers = (int)Math.ceil((double)totalInputFileSize / bytesPerReducer);
        reducers = Math.max(1, reducers);
        reducers = Math.min(maxReducers, reducers);
        return reducers;
    }

其中
if (totalInputFileSize == -1) { return -1; }
表示如果totallnputFileSize==-1,我们则无法获得输入的大小，因此无法估计化简器的数量

getTotalInputFileSize方法

尽可能的获得输入的大小，没有被报导的输入，pig可能会将其文件大小排除在外

static long getTotalInputFileSize(Configuration conf,
            List<POLoad> lds, Job job, long max) throws IOException {
        long totalInputFileSize = 0;
        for (POLoad ld : lds) {
            long size = getInputSizeFromLoader(ld, job);
            if (size > -1) {
                totalInputFileSize += size;
                continue;
            } else {
                for (String location : LoadFunc.getPathStrings(ld.getLFile().getFileName())) {
                    if (UriUtil.isHDFSFileOrLocalOrS3N(location, conf)) {
                        Path path = new Path(location);
                        FileSystem fs = path.getFileSystem(conf);
                        FileStatus[] status = fs.globStatus(path);
                        if (status != null) {
                            for (FileStatus s : status) {
                                totalInputFileSize += MapRedUtil.getPathLength(fs, s, max);
                                if (totalInputFileSize > max) {
                                    break;
                                }
                            }
                        } else {
                            continue;
                        }
                    } else {
                        continue;
                    }
                }
            }
        }
        return totalInputFileSize;
    }