MapReduce Features

- Counters (values are definitive only once job has successfully completed)

  • Task Counters
  • Filesystem Counters
  • Job Counters (only in application master. doesn't need to send across network, mainly about task info)
  • FileInputFormat Counters
  • FileOutputFormat Counters
  • User-defined counters
  1. by enum
context.getCounter(Temperature.MALFORMED).increment(1);
  1. by counter group 
public Counter getCounter(String groupName, String counterName)


- Sorting

  • Partial sort (due to multiple map tasks and multiple reduce tasks)
  • Total sort 
InputSampler.Sampler<IntWritable, Text> sampler =
new InputSampler.RandomSampler<IntWritable, Text>(0.1, 10000, 10);
InputSampler.writePartitionFile(job, sampler);
// Add to DistributedCache
Configuration conf = job.getConfiguration();
String partitionFile = TotalOrderPartitioner.getPartitionFile(conf);
URI partitionUri = new URI(partitionFile);
job.addCacheFile(partitionUri);

  • secondary sort
  1. Make the key a composite of the natural key and the natural value.
  2. The sort comparator should order by the composite key (i.e., the natural key and natural value).
  3. The partitioner and grouping comparator for the composite key should consider only the natural key for partitioning and grouping.
job.setPartitionerClass(FirstPartitioner.class);
job.setSortComparatorClass(KeyComparator.class);
job.setGroupingComparatorClass(GroupComparator.class);


- Join 

  • map side join (strict requirement on splits that same key in splits of different source)
  • reduce side join which is more general
  1. Multiple inputs -> one map task for each source
  2. Secondary sort -> arrange records from different map tasks properly


- side data distribution

  • small data in configuration -> need to be small because,
The job configuration is always read by the client, the application master, and the task JVM, and

each time the configuration is read, all of its entries are read into memory.

  • -files, -archives, -libjars to be copied to node once per job
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值