spark源码分析之UnsafeShuffleWriter

概述

 SortShuffleManager会判断在满足以下条件时调用UnsafeShuffleWriter,否则降级为使用SortShuffleWriter:

  1. Serializer支持relocation。Serializer支持relocation是指,Serializer可以对已经序列化的对象进行排序,这种排序起到的效果和先对数据排序再序列化一致。支持relocation的Serializer是KryoSerializer,Spark默认使用JavaSerializer,通过参数spark.serializer设置;
  2. 不需要map side aggregate,即不能定义aggregator;
  3. partition数量不能大于指定的阈值(2^24);

UnsafeShuffleWriter 将record序列化后插入sorter,然后对已经序列化的record进行排序,并在排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。在合并时会基于spill file的数量和IO compression codec选择最合适的合并策略。

源码分析

ShuffleMapTask获取ShuffleManager

在ShuffleMapTask调用runTask()方法执行任务的时候,会从SparkEnv中获取ShuffleManager。

 ShuffleMapTask的runTask()方法如下:

override def runTask(context: TaskContext): MapStatus = {
    // Deserialize the RDD using the broadcast variable.
    val deserializeStartTime = System.currentTimeMillis()
    val ser = SparkEnv.get.closureSerializer.newInstance()
    val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
    _executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime
 
    metrics = Some(context.taskMetrics)
    var writer: ShuffleWriter[Any, Any] = null
    try {
       //获取shuffleManager
      val manager = SparkEnv.get.shuffleManager
      //shuffleManger获取writer
      writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
      //writer调用write方法
      writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
      return writer.stop(success = true).get
    } catch {
      case e: Exception =>
        try {
          if (writer != null) {
            writer.stop(success = false)
          }
        } catch {
          case e: Exception =>
            log.debug("Could not stop writer", e)
        }
        throw e
    }
  }

 

SortShufflemanager获取writer

SortShuffleManager会根据条件是否满足选择相应的ShuffleHandle,ShuffleHandle对应的shuffle writer如下:

BypassMergeSortShuffleHandleBypassMergeSortShuffleWriter
SerializedShuffleHandleUnsafeShuffleWriter
BaseShuffleHandleSortShuffleWriter

 

 

 

 

registerShuffle方法 

SortShufflemanager.scala

/**
   * Obtains a [[ShuffleHandle]] to pass to tasks.
   */
  override def registerShuffle[K, V, C](
      shuffleId: Int,
      numMaps: Int,
      dependency: ShuffleDependency[K, V, C]): ShuffleHandle = {
    if (SortShuffleWriter.shouldBypassMergeSort(conf, dependency)) {
      // If there are fewer than spark.shuffle.sort.bypassMergeThreshold partitions and we don't
      // need map-side aggregation, then write numPartitions files directly and just concatenate
      // them at the end. This avoids doing serialization and deserialization twice to merge
      // together the spilled files, which would happen with the normal code path. The downside is
      // having multiple files open at a time and thus more memory allocated to buffers.
      new BypassMergeSortShuffleHandle[K, V](
        shuffleId, numMaps, dependency.asInstanceOf[ShuffleDependency[K, V, V]])
    } else if (SortShuffleManager.canUseSerializedShuffle(dependency)) {
      // Otherwise, try to buffer map outputs in a serialized form, since this is more efficient:
      new SerializedShuffleHandle[K, V](
        shuffleId, numMaps, dependency.asInstanceOf[ShuffleDependency[K, V, V]])
    } else {
      // Otherwise, buffer map outputs in a deserialized form:
      new BaseShuffleHandle(shuffleId, numMaps, dependency)
    }
  }

canUseSerializedShuffle方法

该方法判断是否满足调用UnsafeShuffleWriter的条件:

  1. Serializer支持relocation;
  2. 不需要map side aggregate,即不能定义aggregator;
  3. partition数量不能大于指定的阈值(2^24);

SortShufflemanager.scala

/**
   * Helper method for determining whether a shuffle should use an optimized serialized shuffle
   * path or whether it should fall back to the original path that operates on deserialized objects.
   */
  def canUseSerializedShuffle(dependency: ShuffleDependency[_, _, _]): Boolean = {
    val shufId = dependency.shuffleId
    val numPartitions = dependency.partitioner.numPartitions
    //判断serializer是否支持relocation
    if (!dependency.serializer.supportsRelocationOfSerializedObjects) {
      log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
        s"${dependency.serializer.getClass.getName}, does not support object relocation")
      false
    //判断是否map端的聚合
    } else if (dependency.aggregator.isDefined) {
      log.debug(
        s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
      false
    //判断是否大于指定的阈值
    } else if (numPartitions > MAX_SHUFFLE_OUTPUT_PARTITIONS_FOR_SERIALIZED_MODE) {
      log.debug(s"Can't use serialized shuffle for shuffle $shufId because it has more than " +
        s"$MAX_SHUFFLE_OUTPUT_PARTITIONS_FOR_SERIALIZED_MODE partitions")
      false
    } else {
      log.debug(s"Can use serialized shuffle for shuffle $shufId")
      true
    }
  }
}

getWriter方法

如果满足条件,则handle是SerializedShuffleHandle ,创建UnsafeShuffleWriter来写数据。

SortShufflemanager.scala

/** Get a writer for a given partition. Called on executors by map tasks. */
  override def getWriter[K, V](
      handle: ShuffleHandle,
      mapId: Int,
      context: TaskContext): ShuffleWriter[K, V] = {
    numMapsForShuffle.putIfAbsent(
      handle.shuffleId, handle.asInstanceOf[BaseShuffleHandle[_, _, _]].numMaps)
    val env = SparkEnv.get
    handle match {
      case unsafeShuffleHandle: SerializedShuffleHandle[K @unchecked, V @unchecked] =>
        //创建UnsafeShuffleWriter
        new UnsafeShuffleWriter(
          env.blockManager,
          shuffleBlockResolver.asInstanceOf[IndexShuffleBlockResolver],
          context.taskMemoryManager(),
          unsafeShuffleHandle,
          mapId,
          context,
          env.conf)
      case bypassMergeSortHandle: BypassMergeSortShuffleHandle[K @unchecked, V @unchecked] =>
        new BypassMergeSortShuffleWriter(
          env.blockManager,
          shuffleBlockResolver.asInstanceOf[IndexShuffleBlockResolver],
          bypassMergeSortHandle,
          mapId,
          context,
          env.conf)
      case other: BaseShuffleHandle[K @unchecked, V @unchecked, _] =>
        new SortShuffleWriter(shuffleBlockResolver, other, mapId, context)
    }
  }

UnsafeShuffleWriter

write方法

1、将record进行分区并序列化后插入sorter。

2、将record进行排序,并在排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。

@Override
  public void write(scala.collection.Iterator<Product2<K, V>> records) throws IOException {
    // Keep track of success so we know if we encountered an exception
    // We do this rather than a standard try/catch/re-throw to handle
    // generic throwables.
    boolean success = false;
    try {
      //将record进行分区并序列化后插入sorter
      while (records.hasNext()) {
        insertRecordIntoSorter(records.next());
      }
      //将record进行排序,并在排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。
      closeAndWriteOutput();
      success = true;
    } finally {
      if (sorter != null) {
        try {
          sorter.cleanupResources();
        } catch (Exception e) {
          // Only throw this error if we won't be masking another
          // error.
          if (success) {
            throw e;
          } else {
            logger.error("In addition to a failure during writing, we failed during " +
                         "cleanup.", e);
          }
        }
      }
    }
  }

insertRecordToSorter方法

该方法将record进行分区并序列化后插入sorter。方法实现如下:

  1. 调用partitioner.getPartition方法对record的key进行分区,从而确定record被分配到哪个分区,并获取该分区的partitionId;
  2. 将record的key值和value值分别序列后保存到BtyeArrayOutputStream底层的buf字段;
  3. 将序列后的record插入到ShuffleExternalSorter;

在对record进行分区的过程中,假设使用的是HashPartitioner,则getPartition方法会将record的key的hashCode,和numPartition进行取模运算,从而确定record被分配的分区。

在record序列化的过程中,假设使用的是JavaSerializer,流的逐层(自底而上)调用关系为:

MyByteArrayOutputStream 》ObjectOutputStream 》JavaSerializationStream

其中,MyByteArrayOutputStream是ByteArrayOutputStream的子类,用以直接暴露buf[]字段;

ObjectOutputStream是java io实现的序列流,它的writeObject方法用以将对象写入流中;

JavaSerializationStream是ObjectOutputStream的代理类。

最终是将record序列化后保存到ByteArrayOutputStream的buf字段中。

下面代码中,serBuffer是MyByteArrayOutputStream的实例,调用getBuf方法可以获取底层ByteArrayOutputStream的buf字段。

serOutputStream是SerializationStream的实例,他会根据SparkConf初始化为JavaSerializationStream或者KryoSerializationStream。KryoSerializationStream的序列过程不在这里赘述。

ps:当然,JavaSerializer是不支持relocation的,所以事实上不可能会使用JavaSerializationStream。这里只是举个例子。

@VisibleForTesting
  void insertRecordIntoSorter(Product2<K, V> record) throws IOException {
    assert(sorter != null);
    final K key = record._1();
    //getPartition方法对key进行分区,从而确定record被分配到哪个分区,并获取该分区的partitionId
    final int partitionId = partitioner.getPartition(key);
    serBuffer.reset();
    //将record的key值序列后保存到serBuffer底层的buf字段
    serOutputStream.writeKey(key, OBJECT_CLASS_TAG);
    //将record的value值序列后保存到serBuffer底层的buf字段
    serOutputStream.writeValue(record._2(), OBJECT_CLASS_TAG);
    serOutputStream.flush();

    final int serializedRecordSize = serBuffer.size();
    assert (serializedRecordSize > 0);
    //将序列后的record插入到sorter
    sorter.insertRecord(
      serBuffer.getBuf(), Platform.BYTE_ARRAY_OFFSET, serializedRecordSize, partitionId);
  }

 

 closeAndWriteOutput方法

将record进行排序,排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。

该方法实现如下:

1、将内存的record排序,排序完成后写入磁盘文件作为spill file,最后返回这些spill file的元数据信息—— SpillInfo[];

2、构造最终的输出文件实例,其中文件名为(reduceId为0): "shuffle_" + shuffleId + "_" + mapId + "_" + reduceId;

3、在输出文件名后加上uuid用于标识文件正在写入,结束后重命名;

4、将多个spill文件合并成一个输出文件。基于spill文件的数量和IO压缩编解码器选择最合适的合并策略;

5、将每个partition的offset写入index文件方便reduce端fetch数据;

@VisibleForTesting
  void closeAndWriteOutput() throws IOException {
    assert(sorter != null);
    updatePeakMemoryUsed();
    serBuffer = null;
    serOutputStream = null;
    //将内存的record排序,排序完成后写入磁盘文件作为spill file,最后返回这些spill file的元数据信息—— SpillInfo[];
    final SpillInfo[] spills = sorter.closeAndGetSpills();
    sorter = null;
    final long[] partitionLengths;
    /**构造最终的输出文件实例,其中文件名为(reduceId为0): "shuffle_" + shuffleId + "_" + 
       mapId + "_" + reduceId;
    **/
    final File output = shuffleBlockResolver.getDataFile(shuffleId, mapId);
     //在输出文件名后加上uuid用于标识文件正在写入,结束后重命名;
    final File tmp = Utils.tempFileWith(output);
    try {
      try {
        //将多个spill文件合并成一个输出文件。基于spill文件的数量和IO压缩编解码器选择最合适的合并策略。
        partitionLengths = mergeSpills(spills, tmp);
      } finally {
        for (SpillInfo spill : spills) {
          if (spill.file.exists() && ! spill.file.delete()) {
            logger.error("Error while deleting spill file {}", spill.file.getPath());
          }
        }
      }
      //将每个partition的offset写入index文件方便reduce端fetch数据
      shuffleBlockResolver.writeIndexFileAndCommit(shuffleId, mapId, partitionLengths, tmp);
    } finally {
      if (tmp.exists() && !tmp.delete()) {
        logger.error("Error while deleting temp file {}", tmp.getAbsolutePath());
      }
    }
    mapStatus = MapStatus$.MODULE$.apply(blockManager.shuffleServerId(), partitionLengths);
  }

mergeSpills方法

将多个spill文件合并成一个输出文件。基于spill文件的数量和IO compression codec选择最合适的合并策略。

当有多个spill文件时,它的合并策略选择如下:

1、从SparkConf获取是否允许和是否支持fastMerge的信息,如果是,选择fast merge路径,否则选择slow merge路径。

2、当选择fast merge路径后,判断是否允许TransferTo及不需要加密,如果是,使用基于TransferTo的fast merge,否则,使用基于file Stream的fast merge。       

/**
   * Merge zero or more spill files together, choosing the fastest merging strategy based on the
   * number of spills and the IO compression codec.
   *
   * @return the partition lengths in the merged file.
   */
  private long[] mergeSpills(SpillInfo[] spills, File outputFile) throws IOException {
	 //从sparkConf获取是否允许compression的flag
    final boolean compressionEnabled = sparkConf.getBoolean("spark.shuffle.compress", true);
	//根据sparkConf的codec信息生成对应的CompressionCodec
    final CompressionCodec compressionCodec = CompressionCodec$.MODULE$.createCodec(sparkConf);
	 //从sparkConf获取是否允许fastMerge的flag
    final boolean fastMergeEnabled =
      sparkConf.getBoolean("spark.shuffle.unsafe.fastMergeEnabled", true);
    final boolean fastMergeIsSupported = !compressionEnabled ||
      CompressionCodec$.MODULE$.supportsConcatenationOfSerializedStreams(compressionCodec);
    final boolean encryptionEnabled = blockManager.serializerManager().encryptionEnabled();
    try {
      if (spills.length == 0) {
        new FileOutputStream(outputFile).close(); // Create an empty file
        return new long[partitioner.numPartitions()];
      } else if (spills.length == 1) {
        // Here, we don't need to perform any metrics updates because the bytes written to this
        // output file would have already been counted as shuffle bytes written.
        Files.move(spills[0].file, outputFile);
        return spills[0].partitionLengths;
      } else {
        final long[] partitionLengths;
        // There are multiple spills to merge, so none of these spill files' lengths were counted
        // towards our shuffle write count or shuffle write time. If we use the slow merge path,
        // then the final output file's size won't necessarily be equal to the sum of the spill
        // files' sizes. To guard against this case, we look at the output file's actual size when
        // computing shuffle bytes written.
		// 这条条件(if/else)路径,为有多个spill文件要合并,所以没有在shuffle write count或者
		// shuffle write time时计算这些spill文件的长度。 如果我们使用慢合并路径,
        //  那么最终输出文件的大小不一定等于所有spill file的大小的总和。为了防止这种情况 ,我们在
        // 计算shuffle写入的字节时,观察输出文件的真实大小。
        //
        // We allow the individual merge methods to report their own IO times since different merge
        // strategies use different IO techniques.  We count IO during merge towards the shuffle
        // shuffle write time, which appears to be consistent with the "not bypassing merge-sort"
        // branch in ExternalSorter.
		// 我们允许各个合并方法报告它们自己的IO时间,既然不同的合并策略使用不用的IO技术。我们将合并期间
		//的IO时间统计到shuffle write time。
		//
        if (fastMergeEnabled && fastMergeIsSupported) {
          // Compression is disabled or we are using an IO compression codec that supports
          // decompression of concatenated compressed streams, so we can perform a fast spill merge
          // that doesn't need to interpret the spilled bytes.
		  //如果压缩被禁用,或者我们正在使用支持被拼接的压缩流的解压缩的压缩编解码器,我们可以执行
		  //快速的spill文件合并,不需要去解释溢出的字节。
          if (transferToEnabled && !encryptionEnabled) {
            logger.debug("Using transferTo-based fast merge");
            partitionLengths = mergeSpillsWithTransferTo(spills, outputFile);
          } else {
            logger.debug("Using fileStream-based fast merge");
            partitionLengths = mergeSpillsWithFileStream(spills, outputFile, null);
          }
        } else {
          logger.debug("Using slow merge");
          partitionLengths = mergeSpillsWithFileStream(spills, outputFile, compressionCodec);
        }
        // When closing an UnsafeShuffleExternalSorter that has already spilled once but also has
        // in-memory records, we write out the in-memory records to a file but do not count that
        // final write as bytes spilled (instead, it's accounted as shuffle write). The merge needs
        // to be counted as shuffle write, but this will lead to double-counting of the final
        // SpillInfo's bytes.
        writeMetrics.decBytesWritten(spills[spills.length - 1].file.length());
        writeMetrics.incBytesWritten(outputFile.length());
        return partitionLengths;
      }
    } catch (IOException e) {
      if (outputFile.exists() && !outputFile.delete()) {
        logger.error("Unable to delete output file {}", outputFile.getPath());
      }
      throw e;
    }
  }

mergeSpillsWithFileStream方法

使用java FileStream来合并spill file。

该合并方式明显慢于基于NIO(transferTo)的合并方式——UnsafeShuffleWriter#mergeSpillsWithTransferTo(SpillInfo[],
   * File)。因此,它主要用于以下情形:

1、IO compression codec不支持压缩数据的拼接;

2、允许对数据加密;

3、用户明确禁用了TransferTo。版本号为2.6.32的linux内核在使用NIO方式会产生bug,需要将spark.file.transferTo参数设置为false。

4、当一个spill file中各个partition的大小都很小的时候,使用mergeSpillsWithFileStream方法是更快的,因为mergeSpillsWithTransferTo方法执行很小的磁盘IO是低效率的。在这种磁盘IO小且数量多的情况下,使用大缓冲区给输入输出文件有助于减少磁盘IO的数量,使文件合并更快。

该方法实现如下:

1、为每个spill file创建输入流。创建的输入流及流的装饰关系如下:

NioBufferedFileInputStream  》LimitedInputStream 》CryptoInputStream 》compressedInputStream

ps:compressedInputStream不是一个类,只是为了方便陈述用了该名词,它指的是ZstdInputStream、SnappyInputStream、LZFInputStream、LZ4BlockInputStream这些流的其中某一种。compressedOutputStream也是如此。

2、为最终的输出文件outputFile创建输出流。创建的输出流及流的装饰关系如下:

FileOutputStream 》BufferedOutputStream 》CountingOutputStream 》TimeTrackingOutputStream 

》CloseAndFlushShieldOutputStream 》CryptoOutputStream 》compressedOutputStream

3、将输入流的全部字节复制到输出流;

/**
   * Merges spill files using Java FileStreams. This code path is typically slower than
   * the NIO-based merge, {@link UnsafeShuffleWriter#mergeSpillsWithTransferTo(SpillInfo[],
   * File)}, and it's mostly used in cases where the IO compression codec does not support
   * concatenation of compressed data, when encryption is enabled, or when users have
   * explicitly disabled use of {@code transferTo} in order to work around kernel bugs.
   * This code path might also be faster in cases where individual partition size in a spill
   * is small and UnsafeShuffleWriter#mergeSpillsWithTransferTo method performs many small
   * disk ios which is inefficient. In those case, Using large buffers for input and output
   * files helps reducing the number of disk ios, making the file merging faster.
   *
   * @param spills the spills to merge.
   * @param outputFile the file to write the merged data to.
   * @param compressionCodec the IO compression codec, or null if shuffle compression is disabled.
   * @return the partition lengths in the merged file.
   */
  private long[] mergeSpillsWithFileStream(
      SpillInfo[] spills,
      File outputFile,
      @Nullable CompressionCodec compressionCodec) throws IOException {
    assert (spills.length >= 2);
    final int numPartitions = partitioner.numPartitions();
    final long[] partitionLengths = new long[numPartitions];
    final InputStream[] spillInputStreams = new InputStream[spills.length];
    
    //为最终的输出文件outputFile创建输出流
    final OutputStream bos = new BufferedOutputStream(
            new FileOutputStream(outputFile),
            outputBufferSizeInBytes);
    // Use a counting output stream to avoid having to close the underlying file and ask
    // the file system for its size after each partition is written.
    final CountingOutputStream mergedFileOutputStream = new CountingOutputStream(bos);

    boolean threwException = true;
    try {
      //为每个spill file创建输入流
      for (int i = 0; i < spills.length; i++) {
        spillInputStreams[i] = new NioBufferedFileInputStream(
            spills[i].file,
            inputBufferSizeInBytes);
      }
      //外循环遍历partition
      for (int partition = 0; partition < numPartitions; partition++) {
        final long initialFileLength = mergedFileOutputStream.getByteCount();
        // Shield the underlying output stream from close() and flush() calls, so that we can close
        // the higher level streams to make sure all data is really flushed and internal state is
        // cleaned.
        OutputStream partitionOutput = new CloseAndFlushShieldOutputStream(
          new TimeTrackingOutputStream(writeMetrics, mergedFileOutputStream));
        partitionOutput = blockManager.serializerManager().wrapForEncryption(partitionOutput);
        if (compressionCodec != null) {
          partitionOutput = compressionCodec.compressedOutputStream(partitionOutput);
        }
        //内循环遍历spill file
        for (int i = 0; i < spills.length; i++) {
          final long partitionLengthInSpill = spills[i].partitionLengths[partition];
          if (partitionLengthInSpill > 0) {
            InputStream partitionInputStream = new LimitedInputStream(spillInputStreams[i],
              partitionLengthInSpill, false);
            try {
              partitionInputStream = blockManager.serializerManager().wrapForEncryption(
                partitionInputStream);
              if (compressionCodec != null) {
                partitionInputStream = compressionCodec.compressedInputStream(partitionInputStream);
              }
              //将输入流的全部字节复制到输出流
              ByteStreams.copy(partitionInputStream, partitionOutput);
            } finally {
              partitionInputStream.close();
            }
          }
        }
        partitionOutput.flush();
        partitionOutput.close();
        partitionLengths[partition] = (mergedFileOutputStream.getByteCount() - initialFileLength);
      }
      threwException = false;
    } finally {
      // To avoid masking exceptions that caused us to prematurely enter the finally block, only
      // throw exceptions during cleanup if threwException == false.
      for (InputStream stream : spillInputStreams) {
        Closeables.close(stream, threwException);
      }
      Closeables.close(mergedFileOutputStream, threwException);
    }
    return partitionLengths;
  }

mergeSpillsWithTransferTo方法

合并多个spill file,通过使用NIO的transferTo方法来拼接spill partition的字节。

只有当IO compression codec和seializer支持serialized stream的拼接时才是安全的。

该方法实现如下:

1、为每个spill file创建输入流,并获取输入流对应的FileChannel;

2、为最终的输出文件outputFile创建输出流,并获取输出流对应的FileChannel;

3、输入流对应的FileChannel调用transferTo方法,将字节转移到输出流对应的FileChannel;

/**
   * Merges spill files by using NIO's transferTo to concatenate spill partitions' bytes.
   * This is only safe when the IO compression codec and serializer support concatenation of
   * serialized streams.
   *
   * @return the partition lengths in the merged file.
   */
  private long[] mergeSpillsWithTransferTo(SpillInfo[] spills, File outputFile) throws IOException {
    assert (spills.length >= 2);
    final int numPartitions = partitioner.numPartitions();
    final long[] partitionLengths = new long[numPartitions];
    final FileChannel[] spillInputChannels = new FileChannel[spills.length];
    final long[] spillInputChannelPositions = new long[spills.length];
    FileChannel mergedFileOutputChannel = null;

    boolean threwException = true;
    try {
      //为每个spill file创建输入流,并获取输入流对应的通道
      for (int i = 0; i < spills.length; i++) {
        spillInputChannels[i] = new FileInputStream(spills[i].file).getChannel();
      }
      // This file needs to opened in append mode in order to work around a Linux kernel bug that
      // affects transferTo; see SPARK-3948 for more details.
      //为最终的输出文件outputFile创建输出流,并获取输出流对应的通道
      //输出文件需要以追加模式打开
      mergedFileOutputChannel = new FileOutputStream(outputFile, true).getChannel();

      long bytesWrittenToMergedFile = 0;
      //外循环遍历partition
      for (int partition = 0; partition < numPartitions; partition++) {
        //内循环遍历spill file
        for (int i = 0; i < spills.length; i++) {
          final long partitionLengthInSpill = spills[i].partitionLengths[partition];
          final FileChannel spillInputChannel = spillInputChannels[i];
          final long writeStartTime = System.nanoTime();
          //输入流对应的FileChannel调用transferTo方法,将字节转移到输出流对应的FileChannel
          Utils.copyFileStreamNIO(
            spillInputChannel,
            mergedFileOutputChannel,
            spillInputChannelPositions[i],
            partitionLengthInSpill);
          spillInputChannelPositions[i] += partitionLengthInSpill;
          writeMetrics.incWriteTime(System.nanoTime() - writeStartTime);
          bytesWrittenToMergedFile += partitionLengthInSpill;
          partitionLengths[partition] += partitionLengthInSpill;
        }
      }
      // Check the position after transferTo loop to see if it is in the right position and raise an
      // exception if it is incorrect. The position will not be increased to the expected length
      // after calling transferTo in kernel version 2.6.32. This issue is described at
      // https://bugs.openjdk.java.net/browse/JDK-7052359 and SPARK-3948.
      if (mergedFileOutputChannel.position() != bytesWrittenToMergedFile) {
        throw new IOException(
          "Current position " + mergedFileOutputChannel.position() + " does not equal expected " +
            "position " + bytesWrittenToMergedFile + " after transferTo. Please check your kernel" +
            " version to see if it is 2.6.32, as there is a kernel bug which will lead to " +
            "unexpected behavior when using transferTo. You can set spark.file.transferTo=false " +
            "to disable this NIO feature."
        );
      }
      threwException = false;
    } finally {
      // To avoid masking exceptions that caused us to prematurely enter the finally block, only
      // throw exceptions during cleanup if threwException == false.
      for (int i = 0; i < spills.length; i++) {
        assert(spillInputChannelPositions[i] == spills[i].file.length());
        Closeables.close(spillInputChannels[i], threwException);
      }
      Closeables.close(mergedFileOutputChannel, threwException);
    }
    return partitionLengths;
  }

 

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值