spark 2.3源码分析之SortShuffleWriter

最新推荐文章于 2024-07-08 11:50:48 发布

zhifeng687

最新推荐文章于 2024-07-08 11:50:48 发布

阅读量859

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/qq_26222859/article/details/81562272

版权

SortShuffleWriter

概述

SortShuffleWriter它主要是判断在Map端是否需要本地进行combine操作。如果需要聚合，则使用PartitionedAppendOnlyMap；如果不进行combine操作，则使用PartitionedPairBuffer添加数据存放于内存中。然后无论哪一种情况都需要判断内存是否足够，如果内存不够而且又申请不到内存，则需要进行本地磁盘溢写操作，把相关的数据写入溢写到临时文件。最后把内存里的数据和磁盘溢写的临时文件的数据进行合并，如果需要则进行一次归并排序，如果没有发生溢写则是不需要归并排序，因为都在内存里。最后生成合并后的data文件和index文件。

write方法

该方法实现如下：

1、创建外部排序器ExternalSorter, 只是根据是否需要本地combine与否从而决定是否传入aggregator和keyOrdering参数；

2、调用ExternalSorter实例的insertAll方法，插入record；

如果ExternalSorter实例中用以保存record的in-memory collection的大小达到阈值，会将record按顺序溢写到磁盘文件。

3、构造最终的输出文件实例,其中文件名为(reduceId为0)： "shuffle_" + shuffleId + "_" + mapId + "_" + reduceId；

4、在输出文件名后加上uuid用于标识文件正在写入,结束后重命名；

5、调用ExternalSorter实例的writePartitionedFile方法，将插入到该sorter的record进行排序并写入输出文件；

插入到sorter的record可以是在in-memory collection或者在溢写文件。

6、将每个partition的offset写入index文件方便reduce端fetch数据；

7、把部分信息封装到MapStatus返回；

 /** Write a bunch of records to this task's output */
  override def write(records: Iterator[Product2[K, V]]): Unit = {
    sorter = if (dep.mapSideCombine) {
      require(dep.aggregator.isDefined, "Map-side combine without Aggregator specified!")
      new ExternalSorter[K, V, C](
        context, dep.aggregator, Some(dep.partitioner), dep.keyOrdering, dep.serializer)
    } else {
      // In this case we pass neither an aggregator nor an ordering to the sorter, because we don't
      // care whether the keys get sorted in each partition; that will be done on the reduce side
      // if the operation being run is sortByKey.
      new ExternalSorter[K, V, V](
        context, aggregator = None, Some(dep.partitioner), ordering = None, dep.serializer)
    }
    sorter.insertAll(records)

    // Don't bother including the time to open the merged output file in the shuffle write time,
    // because it just opens a single file, so is typically too fast to measure accurately
    // (see SPARK-3570).
   /*构造最终的输出文件实例,其中文件名为(reduceId为0)： "shuffle_" + shuffleId + "_" + mapId + "_" + reduceId；
   */
    val output = shuffleBlockResolver.getDataFile(dep.shuffleId, mapId)
   //在输出文件名后加上uuid用于标识文件正在写入,结束后重命名
    val tmp = Utils.tempFileWith(output)
    try {
      val blockId = ShuffleBlockId(dep.shuffleId, mapId, IndexShuffleBlockResolver.NOOP_REDUCE_ID)
      //将排序后的record写入输出文件
      val partitionLengths = sorter.writePartitionedFile(blockId, tmp)
      //将每个partition的offset写入index文件方便reduce端fetch数据
      shuffleBlockResolver.writeIndexFileAndCommit(dep.shuffleId, mapId, partitionLengths, tmp)
      mapStatus = MapStatus(blockManager.shuffleServerId, partitionLengths)
    } finally {
      if (tmp.exists() && !tmp.delete()) {
        logError(s"Error while deleting temp file ${tmp.getAbsolutePath}")
      }
    }
  }

ExternalSorter

概述

对大量的(k, v)键值对进行排序，并且可能合并，从而产生(k, c)类型的key-combiner对。使用一个partitioner将key分组划分到partition里，然后使用自定义comparator对每个partition里的key进行排序。最后，将每个partition中不同字节范围的(k, v)键值对写入到一个输出文件，以便shuffle fetch。

如果禁用了combining，则类型C必须等于V - 我们将在最后转换对象类型。

注意：虽然ExternalSorter是一个相当通用的分类器，但它的一些配置是绑定到基于sort的shuffle的使用当中。例如：block compression使用的是"spark.shuffle.compress"。如果是在非shuffle上下文使用ExternalSorter，也许我们应该重新审视这个类，使用不同配置设置。

该类几个重要的构造函数参数如下：

@param aggregator 可选，aggregator 具有用于合并数据的组合函数
@param partitioner 可选; 如果给定，则按partitionID排序，然后按key
@param ordering 可选；对每个partition内的key进行排序时的顺序，是一个总的顺序
@param serializer 当溢出到磁盘时使用的serializer

请注意，如果给定了ordering，我们将始终使用它进行排序，所以只有在你确实想要输出的key被排序时才提供这个参数。在没有map端聚合的map task中，你可能想传递None作为ordering参数来避免意外排序。另一方面，如果你真的想做combining，有一个ordering参数的效率是比没有的要高的。

使用者应该使用以下方式与这个类交互：

初始化一个ExternalSorter实例；
调用ExternalSorter实例的insertAll方法，插入一批record；
调用iterator()方法，使用迭代器迭代已经排序完成或者聚合完成的record；或者调用writePartitionedFile()方法，在sort shuffle中将已经排序完成或者聚合完成的的record写入输出文件；

这个类的内部工作原理如下：

我们将内存上的数据反复填充到PartitionedAppendOnlyMap（需要按照key合并时），或者PartitionedPairBuffer（不需要按照key合并时），将它们作为buffer。在这些buffer中，我们会按照PartitionId，以及可能按照key，对元素进行排序。为了避免每个key都调用partitioner多次，我们在每个record上存储partitionId。

当每个buffer到达我们的内存限制时，我们会将其溢出到文件中。这个文件首先按照partitionId进行排序，然后按照key或者key的哈希值进行排序，如果我们想要做聚合的话。对于每个文件，我们都会追踪内存中的每个partition的对象的数量，所以我们不需要为每个元素写上partitionId。

当用户请求使用迭代器或者文件输出时，溢出的文件会被合并，同时包括内存上剩余的数据。合并时使用的是上面定义的排序顺序（除非sorting和aggregation都同时被禁用了）。如果我们需要按照key来聚合，我们要么使用来自ordering参数的总的排序顺序，要么按照相同哈希值读取key值，并且互相比较以合并value值。

期望用户在最后调用stop方法来删除所有中间文件。

ExternalSort的父类

Spillable是ExternalSort的父类。同时，Spillable也是MemoryConsumer的子类。

Spillable类用于当内存超过阈值时，溢出in-memory collection的内容到磁盘上。
in-memory collection指的是PartitionedAppendOnlyMap或者PartitionPairBuffer数据结构。

成员变量

serializerBatchSize：从serializer读取对象，或将对象写入serializer时，对象的批处理数量。当对象以批处理方式写入时，每一批都使用它们自己的serialization stream。这在解序列化一个流时，能减少refrence-tracking map的初始化大小。注意，将这个值设置得过小，会导致在序列化时频繁复制，因为有些serializer在每次对象数量翻倍时，增长内部数据结构是靠growing + copying。
PartitionedAppendOnlyMap和partitionedPairBuffer：in-memory collection，在spill之前在内存上存储record的数据结构。根据是否需要聚合来决定将对象放到AppendOnlyMap还是PartitionedPairBuffer中。如果需要map端的聚合，使用PartitionedOnlyMap，否则使用partitionPairBuffer。
keyComparator：key值的比较器，用以将一个partition内的key进行排序，从而允许聚合或者排序。如果ordering参数没有提供这个comparator，可以使用默认的comparator通过hashcode进行部分排序。部分排序意味着相等的key具有comparator.compare（k，k）= 0，但有些不相等的key也有这个，所以我们需要做一个稍后的传递来找到真正相等的key。ps：equals()方法相等的key，它的hashCode()方法一定相等；hashCode()方法相等的key，equals()方法不一定相等。所以通过比较hashCode只能实现部分排序。
spills：当in-memory collection的大小达到阈值，会将collection上的record按顺序溢出到磁盘文件。用该ArrayBuffer[SpilledFile]实例保存溢写文件的相关信息。

 // Size of object batches when reading/writing from serializers.
  //
  // Objects are written in batches, with each batch using its own serialization stream. This
  // cuts down on the size of reference-tracking maps constructed when deserializing a stream.
  //
  // NOTE: Setting t