hadoop的MR当用MultipleInputs时要获取文件路径方法

最新推荐文章于 2020-06-03 18:18:53 发布

原创最新推荐文章于 2020-06-03 18:18:53 发布 · 2.7k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#Hadoop

hadoop 专栏收录该内容

13 篇文章

订阅专栏

本文详细介绍了如何在Hadoop MR的MultipleInputs模式下，通过反射技术获取文件路径，解决了一个常见的ClassCastException问题。

hadoop的MR当用MultipleInputs时，要获取文件路径的话比较麻烦，需要如下代码

protected void setup(Context context) throws IOException,
        InterruptedException {
    InputSplit split = context.getInputSplit();
    Class<? extends InputSplit> splitClass = split.getClass();

    FileSplit fileSplit = null;
    if (splitClass.equals(FileSplit.class)) {
        fileSplit = (FileSplit) split;
    } else if (splitClass.getName().equals(
            "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
        // begin reflection hackery...

        try {
            Method getInputSplitMethod = splitClass
                    .getDeclaredMethod("getInputSplit");
            getInputSplitMethod.setAccessible(true);
            fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
        } catch (Exception e) {
            // wrap and re-throw error
            throw new IOException(e);
        }

        // end reflection hackery
    }
}

参考链接： http://stackoverflow.com/questions/11130145/hadoop-multipleinputs-fails-with-classcastexception