HDPCD-Java-复习笔记(4)

Map Aggregation


Aggregation

The term refers to a Mapper combining its <key, value> pairs, with the goal of reducing the amount of network traffic between the Mapper and the Reducer.

There are two ways to perform Map Aggregation in Hadoop:

Combiners --- The MapReduce framework has the concept of a Combiner, where you write a class that defines the aggregation, and the framework decides when to perform the aggregation.

In-map Aggregation --- The Mapper contains logic that aggregates records, typically accomplished by buffering records in memory prior to writing them out.


Overview of Combiners


The < key ,value > records output by the Mapper are serialized, so the Combiner has to deserialize them.

A Combiner only aggregates data on one node. It does not combine the output of multiple Mappers.


Reduce-side Combining


The Combiner is also used in the reduce phase if the intermediate <key,value> pairs from Mappers are spilled to disk. 

The fact that the Reducer uses the Combiner behind-the-scenes to improve file I/O.


Counters

The pre-defined counters include usefulinformation,like the number of map input records, or the amount of byteswritten to HDFS.

The Hadoop counters are global -they are asummation of events that occurs across the entire cluster.

 

User-defined Counters

Two ways to define your own counter in Hadoop:

1.Use an enum to define a group,and the elements in the enum become the counter names.

2.Use strings for the group name and counter name.



Combiner Example


public class WordCountCombiner
   extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable outputValue = new IntWritable();
    @Override
    protected void reduce(Text key,
             Iterable<IntWritable> values, Context context)
              throws IOException, InterruptedException {
        int sum = 0;
        for(IntWritable count : values) {
            sum += count.get();
        }
        outputValue.set(sum);
        context.write(key, outputValue);
    }
}


In-Map Aggregation



In-Map Aggregation Example

  • public class TopResultsMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        private ArrayList<Word> words = new ArrayList<Word>();
        private PriorityQueue<Word> queue;
        private int maxResults;
    
        @Override
        protected void setup(Context context)
            throws IOException, InterruptedException {
            maxResults = Integer.parseInt(context.getConfiguration()
                                                 .get("maxResults"));
        }
    
        @Override
        protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
            String[] input = StringUtils.split(value.toString(), '\\', ' ');
    
            for (String word : input) {
                Word currentWord = new Word(word, 1);
    
                if (words.contains(currentWord)) {
                    //increment the existing Word's frequency
                    for (Word w : words) {
                        if (w.equals(currentWord)) {
                            w.frequency++;
    
                            break;
                        }
                    }
                } else {
                    words.add(currentWord);
                }
            }
        }
    
        @Override
        protected void cleanup(Context context)
            throws IOException, InterruptedException {
            Text outputKey = new Text();
            IntWritable outputValue = new IntWritable();
            queue = new PriorityQueue<Word>(words.size());
            queue.addAll(words);
    
            for (int i = 1; i <= maxResults; i++) {
                Word tail = queue.poll();
    
                if (tail != null) {
                    outputKey.set(tail.value);
                    outputValue.set(tail.frequency);
                    context.write(outputKey, outputValue);
                }
            }
        }
    }

public class Word implements Comparable<Word> {
    public String value;
    public int frequency;

    public Word(String value, int frequency) {
        this.value = value;
        this.frequency = frequency;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj instanceof Word) {
            return value.equalsIgnoreCase(((Word) obj).value);
        } else {
            return false;
        }
    }

    @Override
    public int compareTo(Word w) {
        return w.frequency - this.frequency;
    }
}


public enum MyCounters {
     GOOD_RECORDS, BAD_RECORDS
}
context.getCounter(MyCounters.GOOD_RECORDS).increment(1);





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值