0917 mapTask reduceTask 源码

最新推荐文章于 2024-05-14 16:03:04 发布

ruanmianmian1

最新推荐文章于 2024-05-14 16:03:04 发布

阅读量113

点赞数

本文链接：https://blog.csdn.net/ruanmianmian1/article/details/100939157

版权

yarnChild.class //run the Task run()

mapTask -> run()->runNewMapper() //make a mapper //make the input format //input split

NewOutputCollector()

mapContext 打包map所需的各种信息，输入，输出，切片，作业job以及任务ID

mapperContext封装 mapcontext 并运行map方法

mapper.run(mapperContext); nextKeyValue()  getCurrentKey()  getCurrentValue()   TaskInputOutputContextImpl->write()

在LineRecordReader.class中实现了 nextKeyValue() getCurrentKey() getCurrentValue()

input.initialize(split, mapperContext); NewTrackingRecordReader->initialize()  real.initialize(split, context);

LineRecordReader.class->initialize()

如果当前map任务对应的不是第一个切片，则丢掉第一行，将start向前推进一行的字节数。匿名对象new Text()的值无法获取。

如果不是第一个切片，则丢掉第一行，到切片结尾后多读一行，此时需要跨网络读取

如果是第一个切片，第一行不用丢掉，因为没有断行。该方法处理了读到一半行的问题

context.write(outputKey,outputValue); ->TaskInputOutputContextImpl.class的write方法

output.write(key, value);

if (job.getNumReduceTasks() == 0) {
  output = 
    new NewDirectOutputCollector(taskContext, job, umbilical, reporter);
} else {
  output = new NewOutputCollector(taskContext, job, umbilical, reporter);
} 没有reducer 直接写到hdfs LineRecordWriter的write方法： 首先通过writeObject方法将key写出去，然后写一个键值对的分隔符，再写value，最后写一个换行符

存在reduce任务

NewOutputCollector -> write() 计算分区号

collector.collect(key, value,
                  partitioner.getPartition(key, value, partitions));  collector是MapOutputBuffer类

MapOutputBuffer中实现了环形缓冲区

final float spillper =
  job.getFloat(JobContext.MAP_SORT_SPILL_PERCENT, (float)0.8);
final int sortmb = job.getInt(JobContext.IO_SORT_MB, 100);
indexCacheMemoryLimit = job.getInt(JobContext.INDEX_CACHE_MEMORY_LIMIT,
                                   INDEX_CACHE_MEMORY_LIMIT_DEFAULT);

spillper表示缓冲区阈值80%，可以自行设置：

sortmb表示环形缓冲区大小：默认100MB

sorter = ReflectionUtils.newInstance(job.getClass("map.sort.class",
      QuickSort.class, IndexedSorter.class), job);

sorter.sort(MapOutputBuffer.this, mstart, mend, reporter);

sorter是排序器，如果设置了就是用自己的，否则使用默认值：QuickSort（快排）

快排中调用了compare方法comparator.compare() 返回 0 1 -1 来比较两个key的大小

Text比较器

Longwritable比较器

这是排序最终会调用的方法 
还可以 自定义排序比较器
还一种是在自定义key时注册比较器 自定义key比较器会被自定义排序比较器覆盖掉

溢写的文件如果要combine，需要达到3个小文件

可以用job set这个值完成自定义

ruanmianmian1

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫