Cloud Computing(2)_Basic MapReduce Algorithm Design_Local Aggregation

In MapReduce, the programmers needs only to implement the mapper, the reducer, and optionally, the combiner and the partitioner.
The execution frameworkk handles everything else.

Local Aggregation

Local Aggregation of intermediate results is one of the key to efficient algorithm.
Through use of the combiner and by taking advantage of the ability to preserve state across multiple inputs, it is often possible to substantially reduce both the number and size of key-value pairs that need to be shued from the mappers to the reducers.

Importance of Local Aggregation
  • Ideal scaling characteristics: Twice the data, twice the running time; Twice the resources, half the running time
  • Why can’t we achieve this? Synchronization requires communication; Communication kills performance
  • Thus, we can avoid communication: Reduce intermediate data via local aggregation; Combiners can help, too
Example1: Word Count
//Base line
class MAPPER
    method MAP(docid a, doc d) //the words needed are in the doc d 
        for all term t∈doc d do
            EMIT(term t, count 1)
class REDUCER
    method REDUCE(term t, counts[c1, c2,...])
        sum = 0
        for all count c∈counts[c1, c2,...] do
            sum = sum + c
        EMIT(term t, count s)
//Version 1
class MAPPER
    method MAP(docid a, doc d) //the words needed are in the doc d 
        H = new ASSOCIATIVEARRAY //Define the variable H in the MAP method, combine in the mapper
        for all term tdoc d do
            H{t} = H{t} + 1    //Tally counts for entire document
        for all term tH do
            EMIT(term t, count H{t})
//Version 2
class MAPPER
    method INITIALIZE
        H = new ASSOCIATIVEARRAY //Define the variable H out of the MAP method, combine across multiple mappers
    method MAP(docid a, doc d) //the words needed are in the doc d 
        for all term tdoc d do
            H{t} = H{t} + 1    //Tally counts for entire document
    method CLOSE
        for all term tH do
            EMIT(term t, count H{t})
Design Pattern for Local Aggregation
  • “In-mapper combining”
    • Fold the functionality of the combiner into the mapper by preserving state across multiple map calls
  • Advantages
    • Speed
  • Disadvantages
    • Explicit memory management required: variable H can’t be free immediately when the mapper is done
    • Potential for order-dependent bugs: the speed of different mappers can be different
Combiner Design
  • Combiners and Reducers share the same method signature
    - 但是combiner一般执行本地的中间结果汇聚,“mini-reducer”;而reducer一般执行不同mapper输出结果的汇聚
  • Combiners are optional optimations
    - Should not affect algorithm correctness
    - May be run 0, 1 or multiple times: 故combiner的输出结果格式应该与mapper的输出结果格式相同,与reducer的输入结果格式相同

    Example2: Find average of integers associated with the same key(例如:找出所有叫“张三”的人的年龄的平均值)
//Version 1
class MAPPER
    method MAP(string t, integer r)
        EMIT(string t, integer r)
class REDUCER
    method REDUCE(string t, integers[r1, r2, ...])
        sum = 0
        cnt = 0
        for all  integer r∈integers[r1, r2, ...] do
            sum = sum + r
            cnt = cnt + 1
        r_avg = sum / cnt
        EMIT(string t, integer r_avg)
//Version 2
//In fact, 本版本是无法执行的
//因为当combiner不执行的时候,reducer是无法执行的,由于输入格式和mapper的输出格式不一致
class MAPPER
    method MAP(string t, integer r)
        EMIT(string t, integer r)

//对某个mapper中的叫“张三”的人的年龄进行汇聚 
class COMBINER
    method COMBINE(string t, integers[r1, r2, ...])
        sum = 0
        cnt = 0
        for all  integer r∈integers[r1, r2, ...] do
            sum = sum + r
            cnt = cnt + 1
        EMIT(string t, pair(sum, cnt)) 

class REDUCER
    method REDUCE(string t, pairs[(s1, r1), (s2, r2), ...])
        sum = 0
        cnt = 0
        for all  integer (s,r)∈pairs[(s1, r1), (s2, r2), ...] do
            sum = sum + r
            cnt = cnt + 1
        r_avg = sum / cnt
        EMIT(string t, integer r_avg)
//Version 3
//将mapper的输出格式改成和combiner的输出格式一样
class MAPPER
    method MAP(string t, integer r)
        EMIT(string t, pair(r, 1))

class COMBINER
    method COMBINE(string t, integers[r1, r2, ...])
        sum = 0
        cnt = 0
        for all  integer r∈integers[r1, r2, ...] do
            sum = sum + r
            cnt = cnt + 1
        EMIT(string t, pair(sum, cnt)) 

class REDUCER
    method REDUCE(string t, pairs[(s1, r1), (s2, r2), ...])
        sum = 0
        cnt = 0
        for all  integer (s,r)∈pairs[(s1, r1), (s2, r2), ...] do
            sum = sum + r
            cnt = cnt + 1
        r_avg = sum / cnt
        EMIT(string t, integer r_avg)
//Version 4
//设置了S C两个变量,使之在不同Mapper之间传递
//若S C跨越了所有的mapper,则后期则不再需要combiner
class MAPPER
    method INITIALIZE
        S = new ASSOCIATIVEARRAY
        C = new ASSOCIATIVEARRAY
    method MAP(string t, integer r)
        S{t} = S{t} + r
        C{t} = C{t} + 1
    method CLOSE
        for all term tS do
            EMIT(term t, pair(S{t}, C{t}))
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值