Hadoop入门系列(4) -- MapReduce详解

最新推荐文章于 2023-07-22 10:10:11 发布

weishantc

最新推荐文章于 2023-07-22 10:10:11 发布

阅读量646

点赞数

分类专栏： BigData 文章标签： hadoop mapreduce 云计算大数据

本文链接：https://blog.csdn.net/weishantc/article/details/46731145

版权

Map过程

wordCount样例中的map过程如下

public static class TokenizerMapper extends
        Mapper<LongWritable, Text, Text, IntWritable> {
   

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }
}

Reduce过程

wordCount样例中的reduce过程如下:


public static class IntSumReducer extends
        Reducer<Text, IntWritable, Text, IntWritable> {
   
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
}

Job调用

main函数中调用job的过程:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args)
            .getRemainingArgs();
    if (otherArgs.length !=