hadoop输出控制，输出到指定文件中

最新推荐文章于 2022-11-21 10:29:10 发布

longshenlmj

最新推荐文章于 2022-11-21 10:29:10 发布

阅读量3.9k

点赞数

分类专栏： hadoop 文章标签： hadoop

本文链接：https://blog.csdn.net/longshenlmj/article/details/8991032

版权

这篇博客探讨了如何在Hadoop中控制Wordcount程序的输出，使其内容写入指定的文件夹。通过使用MultipleOutputs类，可以在Hadoop作业的输出目录下创建多个文件，并将不同数据分别写入这些文件，避免数据覆盖。文章详细展示了TokenizerMapper和IntSumReducer的实现，以及如何配置Job以实现这一功能。

摘要由CSDN通过智能技术生成

最近在研究将hadoop输出内容放到指定的文件夹中，

（未完待续）

以wordcount内容为例子：

public class wordcount {
    public static class TokenizerMapper extends
            Mapper<Object, Text, Text, IntWritable>
    {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }


    public static class IntSumReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {