概述
SortShuffleManager会判断在满足以下条件时调用UnsafeShuffleWriter,否则降级为使用SortShuffleWriter:
- Serializer支持relocation。Serializer支持relocation是指,Serializer可以对已经序列化的对象进行排序,这种排序起到的效果和先对数据排序再序列化一致。支持relocation的Serializer是KryoSerializer,Spark默认使用JavaSerializer,通过参数spark.serializer设置;
- 不需要map side aggregate,即不能定义aggregator;
- partition数量不能大于指定的阈值(2^24);
UnsafeShuffleWriter 将record序列化后插入sorter,然后对已经序列化的record进行排序,并在排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。在合并时会基于spill file的数量和IO compression codec选择最合适的合并策略。
源码分析
ShuffleMapTask获取ShuffleManager
在ShuffleMapTask调用runTask()方法执行任务的时候,会从SparkEnv中获取ShuffleManager。
ShuffleMapTask的runTask()方法如下:
override def runTask(context: TaskContext): MapStatus = {
// Deserialize the RDD using the broadcast variable.
val deserializeStartTime = System.currentTimeMillis()
val ser = SparkEnv.get.closureSerializer.newInstance()
val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
_executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime
metrics = Some(context.taskMetrics)
var writer: ShuffleWriter[Any, Any] = null
try {
//获取shuffleManager
val manager = SparkEnv.get.shuffleManager
//shuffleManger获取writer
writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
//writer调用write方法
writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
return writer.stop(success = true).get
} catch {
case e: Exception =>
try {
if (writer != null) {
writer.stop(success = false)
}
} catch {
case e: Exception =>
log.debug("Could not stop writer", e)
}
throw e
}
}
SortShufflemanager获取writer
SortShuffleManager会根据条件是否满足选择相应的ShuffleHandle,ShuffleHandle对应的shuffle writer如下:
BypassMergeSortShuffleHandle | BypassMergeSortShuffleWriter |
SerializedShuffleHandle | UnsafeShuffleWriter |
BaseShuffleHandle | SortShuffleWriter |
registerShuffle方法
SortShufflemanager.scala
/**
* Obtains a [[ShuffleHandle]] to pass to tasks.
*/
override def registerShuffle[K, V, C](
shuffleId: Int,
numMaps: Int,
dependency: ShuffleDependency[K, V, C]): ShuffleHandle = {
if (SortShuffleWriter.shouldBypassMergeSort(conf, dependency)) {
// If there are fewer than spark.shuffle.sort.bypassMergeThreshold partitions and we don't
// need map-side aggregation, then write numPartitions files directly and just concatenate
// them at the end. This avoids doing serialization and deserialization twice to merge
// together the spilled files, which would happen with the normal code path. The downside is
// having multiple files open at a time and thus more memory allocated to buffers.
new BypassMergeSortShuffleHandle[K, V](
shuffleId, numMaps, dependency.asInstanceOf[ShuffleDependency[K, V, V]])
} else if (SortShuffleManager.canUseSerializedShuffle(dependency)) {
// Otherwise, try to buffer map outputs in a serialized form, since this is more efficient:
new SerializedShuffleHandle[K, V](
shuffleId, numMaps, dependency.asInstanceOf[ShuffleDependency[K, V, V]])
} else {
// Otherwise, buffer map outputs in a deserialized form:
new BaseShuffleHandle(shuffleId, numMaps, dependency)
}
}
canUseSerializedShuffle方法
该方法判断是否满足调用UnsafeShuffleWriter的条件:
- Serializer支持relocation;
- 不需要map side aggregate,即不能定义aggregator;
- partition数量不能大于指定的阈值(2^24);
SortShufflemanager.scala
/**
* Helper method for determining whether a shuffle should use an optimized serialized shuffle
* path or whether it should fall back to the original path that operates on deserialized objects.
*/
def canUseSerializedShuffle(dependency: ShuffleDependency[_, _, _]): Boolean = {
val shufId = dependency.shuffleId
val numPartitions = dependency.partitioner.numPartitions
//判断serializer是否支持relocation
if (!dependency.serializer.supportsRelocationOfSerializedObjects) {
log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
s"${dependency.serializer.getClass.getName}, does not support object relocation")
false
//判断是否map端的聚合
} else if (dependency.aggregator.isDefined) {
log.debug(
s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
false
//判断是否大于指定的阈值
} else if (numPartitions > MAX_SHUFFLE_OUTPUT_PARTITIONS_FOR_SERIALIZED_MODE) {
log.debug(s"Can't use serialized shuffle for shuffle $shufId because it has more than " +
s"$MAX_SHUFFLE_OUTPUT_PARTITIONS_FOR_SERIALIZED_MODE partitions")
false
} else {
log.debug(s"Can use serialized shuffle for shuffle $shufId")
true
}
}
}
getWriter方法
如果满足条件,则handle是SerializedShuffleHandle ,创建UnsafeShuffleWriter来写数据。
SortShufflemanager.scala
/** Get a writer for a given partition. Called on executors by map tasks. */
override def getWriter[K, V](
handle: ShuffleHandle,
mapId: Int,
context: TaskContext): ShuffleWriter[K, V] = {
numMapsForShuffle.putIfAbsent(
handle.shuffleId, handle.asInstanceOf[BaseShuffleHandle[_, _, _]].numMaps)
val env = SparkEnv.get
handle match {
case unsafeShuffleHandle: SerializedShuffleHandle[K @unchecked, V @unchecked] =>
//创建UnsafeShuffleWriter
new UnsafeShuffleWriter(
env.blockManager,
shuffleBlockResolver.asInstanceOf[IndexShuffleBlockResolver],
context.taskMemoryManager(),
unsafeShuffleHandle,
mapId,
context,
env.conf)
case bypassMergeSortHandle: BypassMergeSortShuffleHandle[K @unchecked, V @unchecked] =>
new BypassMergeSortShuffleWriter(
env.blockManager,
shuffleBlockResolver.asInstanceOf[IndexShuffleBlockResolver],
bypassMergeSortHandle,
mapId,
context,
env.conf)
case other: BaseShuffleHandle[K @unchecked, V @unchecked, _] =>
new SortShuffleWriter(shuffleBlockResolver, other, mapId, context)
}
}
UnsafeShuffleWriter
write方法
1、将record进行分区并序列化后插入sorter。
2、将record进行排序,并在排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。
@Override
public void write(scala.collection.Iterator<Product2<K, V>> records) throws IOException {
// Keep track of success so we know if we encountered an exception
// We do this rather than a standard try/catch/re-throw to handle
// generic throwables.
boolean success = false;
try {
//将record进行分区并序列化后插入sorter
while (records.hasNext()) {
insertRecordIntoSorter(records.next());
}
//将record进行排序,并在排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。
closeAndWriteOutput();
success = true;
} finally {
if (sorter != null) {
try {
sorter.cleanupResources();
} catch (Exception e) {
// Only throw this error if we won't be masking another
// error.
if (success) {
throw e;
} else {
logger.error("In addition to a failure during writing, we failed during " +
"cleanup.", e);
}
}
}
}
}
insertRecordToSorter方法
该方法将record进行分区并序列化后插入sorter。方法实现如下:
- 调用partitioner.getPartition方法对record的key进行分区,从而确定record被分配到哪个分区,并获取该分区的partitionId;
- 将record的key值和value值分别序列后保存到BtyeArrayOutputStream底层的buf字段;
- 将序列后的record插入到ShuffleExternalSorter;
在对record进行分区的过程中,假设使用的是HashPartitioner,则getPartition方法会将record的key的hashCode,和numPartition进行取模运算,从而确定record被分配的分区。
在record序列化的过程中,假设使用的是JavaSerializer,流的逐层(自底而上)调用关系为:
MyByteArrayOutputStream 》ObjectOutputStream 》JavaSerializationStream
其中,MyByteArrayOutputStream是ByteArrayOutputStream的子类,用以直接暴露buf[]字段;
ObjectOutputStream是java io实现的序列流,它的writeObject方法用以将对象写入流中;
JavaSerializationStream是ObjectOutputStream的代理类。
最终是将record序列化后保存到ByteArrayOutputStream的buf字段中。
下面代码中,serBuffer是MyByteArrayOutputStream的实例,调用getBuf方法可以获取底层ByteArrayOutputStream的buf字段。
serOutputStream是SerializationStream的实例,他会根据SparkConf初始化为JavaSerializationStream或者KryoSerializationStream。KryoSerializationStream的序列过程不在这里赘述。
ps:当然,JavaSerializer是不支持relocation的,所以事实上不可能会使用JavaSerializationStream。这里只是举个例子。
@VisibleForTesting
void insertRecordIntoSorter(Product2<K, V> record) throws IOException {
assert(sorter != null);
final K key = record._1();
//getPartition方法对key进行分区,从而确定record被分配到哪个分区,并获取该分区的partitionId
final int partitionId = partitioner.getPartition(key);
serBuffer.reset();
//将record的key值序列后保存到serBuffer底层的buf字段
serOutputStream.writeKey(key, OBJECT_CLASS_TAG);
//将record的value值序列后保存到serBuffer底层的buf字段
serOutputStream.writeValue(record._2(), OBJECT_CLASS_TAG);
serOutputStream.flush();
final int serializedRecordSize = serBuffer.size();
assert (serializedRecordSize > 0);
//将序列后的record插入到sorter
sorter.insertRecord(
serBuffer.getBuf(), Platform.BYTE_ARRAY_OFFSET, serializedRecordSize, partitionId);
}
closeAndWriteOutput方法
将record进行排序,排序完成后写入磁盘文件作为spill file,再将多个spill file合并成一个输出文件。
该方法实现如下:
1、将内存的record排序,排序完成后写入磁盘文件作为spill file,最后返回这些spill file的元数据信息—— SpillInfo[];
2、构造最终的输出文件实例,其中文件名为(reduceId为0): "shuffle_" + shuffleId + "_" + mapId + "_" + reduceId;
3、在输出文件名后加上uuid用于标识文件正在写入,结束后重命名;
4、将多个spill文件合并成一个输出文件。基于spill文件的数量和IO压缩编解码器选择最合适的合并策略;
5、将每个partition的offset写入index文件方便reduce端fetch数据;
@VisibleForTesting
void closeAndWriteOutput() throws IOException {
assert(sorter != null);
updatePeakMemoryUsed();
serBuffer = null;
serOutputStream = null;
//将内存的record排序,排序完成后写入磁盘文件作为spill file,最后返回这些spill file的元数据信息—— SpillInfo[];
final SpillInfo[] spills = sorter.closeAndGetSpills();
sorter = null;
final long[] partitionLengths;
/**构造最终的输出文件实例,其中文件名为(reduceId为0): "shuffle_" + shuffleId + "_" +
mapId + "_" + reduceId;
**/
final File output = shuffleBlockResolver.getDataFile(shuffleId, mapId);
//在输出文件名后加上uuid用于标识文件正在写入,结束后重命名;
final File tmp = Utils.tempFileWith(output);
try {
try {
//将多个spill文件合并成一个输出文件。基于spill文件的数量和IO压缩编解码器选择最合适的合并策略。
partitionLengths = mergeSpills(spills, tmp);
} finally {
for (SpillInfo spill : spills) {
if (spill.file.exists() && ! spill.file.delete()) {
logger.error("Error while deleting spill file {}", spill.file.getPath());
}
}
}
//将每个partition的offset写入index文件方便reduce端fetch数据
shuffleBlockResolver.writeIndexFileAndCommit(shuffleId, mapId, partitionLengths, tmp);
} finally {
if (tmp.exists() && !tmp.delete()) {
logger.error("Error while deleting temp file {}", tmp.getAbsolutePath());
}
}
mapStatus = MapStatus$.MODULE$.apply(blockManager.shuffleServerId(), partitionLengths);
}
mergeSpills方法
将多个spill文件合并成一个输出文件。基于spill文件的数量和IO compression codec选择最合适的合并策略。
当有多个spill文件时,它的合并策略选择如下:
1、从SparkConf获取是否允许和是否支持fastMerge的信息,如果是,选择fast merge路径,否则选择slow merge路径。
2、当选择fast merge路径后,判断是否允许TransferTo及不需要加密,如果是,使用基于TransferTo的fast merge,否则,使用基于file Stream的fast merge。
/**
* Merge zero or more spill files together, choosing the fastest merging strategy based on the
* number of spills and the IO compression codec.
*
* @return the partition lengths in the merged file.
*/
private long[] mergeSpills(SpillInfo[] spills, File outputFile) throws IOException {
//从sparkConf获取是否允许compression的flag
final boolean compressionEnabled = sparkConf.getBoolean("spark.shuffle.compress", true);
//根据sparkConf的codec信息生成对应的CompressionCodec
final CompressionCodec compressionCodec = CompressionCodec$.MODULE$.createCodec(sparkConf);
//从sparkConf获取是否允许fastMerge的flag
final boolean fastMergeEnabled =
sparkConf.getBoolean("spark.shuffle.unsafe.fastMergeEnabled", true);
final boolean fastMergeIsSupported = !compressionEnabled ||
CompressionCodec$.MODULE$.supportsConcatenationOfSerializedStreams(compressionCodec);
final boolean encryptionEnabled = blockManager.serializerManager().encryptionEnabled();
try {
if (spills.length == 0) {
new FileOutputStream(outputFile).close(); // Create an empty file
return new long[partitioner.numPartitions()];
} else if (spills.length == 1) {
// Here, we don't need to perform any metrics updates because the bytes written to this
// output file would have already been counted as shuffle bytes written.
Files.move(spills[0].file, outputFile);
return spills[0].partitionLengths;
} else {
final long[] partitionLengths;
// There are multiple spills to merge, so none of these spill files' lengths were counted
// towards our shuffle write count or shuffle write time. If we use the slow merge path,
// then the final output file's size won't necessarily be equal to the sum of the spill
// files' sizes. To guard against this case, we look at the output file's actual size when
// computing shuffle bytes written.
// 这条条件(if/else)路径,为有多个spill文件要合并,所以没有在shuffle write count或者
// shuffle write time时计算这些spill文件的长度。 如果我们使用慢合并路径,
// 那么最终输出文件的大小不一定等于所有spill file的大小的总和。为了防止这种情况 ,我们在
// 计算shuffle写入的字节时,观察输出文件的真实大小。
//
// We allow the individual merge methods to report their own IO times since different merge
// strategies use different IO techniques. We count IO during merge towards the shuffle
// shuffle write time, which appears to be consistent with the "not bypassing merge-sort"
// branch in ExternalSorter.
// 我们允许各个合并方法报告它们自己的IO时间,既然不同的合并策略使用不用的IO技术。我们将合并期间
//的IO时间统计到shuffle write time。
//
if (fastMergeEnabled && fastMergeIsSupported) {
// Compression is disabled or we are using an IO compression codec that supports
// decompression of concatenated compressed streams, so we can perform a fast spill merge
// that doesn't need to interpret the spilled bytes.
//如果压缩被禁用,或者我们正在使用支持被拼接的压缩流的解压缩的压缩编解码器,我们可以执行
//快速的spill文件合并,不需要去解释溢出的字节。
if (transferToEnabled && !encryptionEnabled) {
logger.debug("Using transferTo-based fast merge");
partitionLengths = mergeSpillsWithTransferTo(spills, outputFile);
} else {
logger.debug("Using fileStream-based fast merge");
partitionLengths = mergeSpillsWithFileStream(spills, outputFile, null);
}
} else {
logger.debug("Using slow merge");
partitionLengths = mergeSpillsWithFileStream(spills, outputFile, compressionCodec);
}
// When closing an UnsafeShuffleExternalSorter that has already spilled once but also has
// in-memory records, we write out the in-memory records to a file but do not count that
// final write as bytes spilled (instead, it's accounted as shuffle write). The merge needs
// to be counted as shuffle write, but this will lead to double-counting of the final
// SpillInfo's bytes.
writeMetrics.decBytesWritten(spills[spills.length - 1].file.length());
writeMetrics.incBytesWritten(outputFile.length());
return partitionLengths;
}
} catch (IOException e) {
if (outputFile.exists() && !outputFile.delete()) {
logger.error("Unable to delete output file {}", outputFile.getPath());
}
throw e;
}
}
mergeSpillsWithFileStream方法
使用java FileStream来合并spill file。
该合并方式明显慢于基于NIO(transferTo)的合并方式——UnsafeShuffleWriter#mergeSpillsWithTransferTo(SpillInfo[],
* File)。因此,它主要用于以下情形:
1、IO compression codec不支持压缩数据的拼接;
2、允许对数据加密;
3、用户明确禁用了TransferTo。版本号为2.6.32的linux内核在使用NIO方式会产生bug,需要将spark.file.transferTo参数设置为false。
4、当一个spill file中各个partition的大小都很小的时候,使用mergeSpillsWithFileStream方法是更快的,因为mergeSpillsWithTransferTo方法执行很小的磁盘IO是低效率的。在这种磁盘IO小且数量多的情况下,使用大缓冲区给输入输出文件有助于减少磁盘IO的数量,使文件合并更快。
该方法实现如下:
1、为每个spill file创建输入流。创建的输入流及流的装饰关系如下:
NioBufferedFileInputStream 》LimitedInputStream 》CryptoInputStream 》compressedInputStream
ps:compressedInputStream不是一个类,只是为了方便陈述用了该名词,它指的是ZstdInputStream、SnappyInputStream、LZFInputStream、LZ4BlockInputStream这些流的其中某一种。compressedOutputStream也是如此。
2、为最终的输出文件outputFile创建输出流。创建的输出流及流的装饰关系如下:
FileOutputStream 》BufferedOutputStream 》CountingOutputStream 》TimeTrackingOutputStream
》CloseAndFlushShieldOutputStream 》CryptoOutputStream 》compressedOutputStream
3、将输入流的全部字节复制到输出流;
/**
* Merges spill files using Java FileStreams. This code path is typically slower than
* the NIO-based merge, {@link UnsafeShuffleWriter#mergeSpillsWithTransferTo(SpillInfo[],
* File)}, and it's mostly used in cases where the IO compression codec does not support
* concatenation of compressed data, when encryption is enabled, or when users have
* explicitly disabled use of {@code transferTo} in order to work around kernel bugs.
* This code path might also be faster in cases where individual partition size in a spill
* is small and UnsafeShuffleWriter#mergeSpillsWithTransferTo method performs many small
* disk ios which is inefficient. In those case, Using large buffers for input and output
* files helps reducing the number of disk ios, making the file merging faster.
*
* @param spills the spills to merge.
* @param outputFile the file to write the merged data to.
* @param compressionCodec the IO compression codec, or null if shuffle compression is disabled.
* @return the partition lengths in the merged file.
*/
private long[] mergeSpillsWithFileStream(
SpillInfo[] spills,
File outputFile,
@Nullable CompressionCodec compressionCodec) throws IOException {
assert (spills.length >= 2);
final int numPartitions = partitioner.numPartitions();
final long[] partitionLengths = new long[numPartitions];
final InputStream[] spillInputStreams = new InputStream[spills.length];
//为最终的输出文件outputFile创建输出流
final OutputStream bos = new BufferedOutputStream(
new FileOutputStream(outputFile),
outputBufferSizeInBytes);
// Use a counting output stream to avoid having to close the underlying file and ask
// the file system for its size after each partition is written.
final CountingOutputStream mergedFileOutputStream = new CountingOutputStream(bos);
boolean threwException = true;
try {
//为每个spill file创建输入流
for (int i = 0; i < spills.length; i++) {
spillInputStreams[i] = new NioBufferedFileInputStream(
spills[i].file,
inputBufferSizeInBytes);
}
//外循环遍历partition
for (int partition = 0; partition < numPartitions; partition++) {
final long initialFileLength = mergedFileOutputStream.getByteCount();
// Shield the underlying output stream from close() and flush() calls, so that we can close
// the higher level streams to make sure all data is really flushed and internal state is
// cleaned.
OutputStream partitionOutput = new CloseAndFlushShieldOutputStream(
new TimeTrackingOutputStream(writeMetrics, mergedFileOutputStream));
partitionOutput = blockManager.serializerManager().wrapForEncryption(partitionOutput);
if (compressionCodec != null) {
partitionOutput = compressionCodec.compressedOutputStream(partitionOutput);
}
//内循环遍历spill file
for (int i = 0; i < spills.length; i++) {
final long partitionLengthInSpill = spills[i].partitionLengths[partition];
if (partitionLengthInSpill > 0) {
InputStream partitionInputStream = new LimitedInputStream(spillInputStreams[i],
partitionLengthInSpill, false);
try {
partitionInputStream = blockManager.serializerManager().wrapForEncryption(
partitionInputStream);
if (compressionCodec != null) {
partitionInputStream = compressionCodec.compressedInputStream(partitionInputStream);
}
//将输入流的全部字节复制到输出流
ByteStreams.copy(partitionInputStream, partitionOutput);
} finally {
partitionInputStream.close();
}
}
}
partitionOutput.flush();
partitionOutput.close();
partitionLengths[partition] = (mergedFileOutputStream.getByteCount() - initialFileLength);
}
threwException = false;
} finally {
// To avoid masking exceptions that caused us to prematurely enter the finally block, only
// throw exceptions during cleanup if threwException == false.
for (InputStream stream : spillInputStreams) {
Closeables.close(stream, threwException);
}
Closeables.close(mergedFileOutputStream, threwException);
}
return partitionLengths;
}
mergeSpillsWithTransferTo方法
合并多个spill file,通过使用NIO的transferTo方法来拼接spill partition的字节。
只有当IO compression codec和seializer支持serialized stream的拼接时才是安全的。
该方法实现如下:
1、为每个spill file创建输入流,并获取输入流对应的FileChannel;
2、为最终的输出文件outputFile创建输出流,并获取输出流对应的FileChannel;
3、输入流对应的FileChannel调用transferTo方法,将字节转移到输出流对应的FileChannel;
/**
* Merges spill files by using NIO's transferTo to concatenate spill partitions' bytes.
* This is only safe when the IO compression codec and serializer support concatenation of
* serialized streams.
*
* @return the partition lengths in the merged file.
*/
private long[] mergeSpillsWithTransferTo(SpillInfo[] spills, File outputFile) throws IOException {
assert (spills.length >= 2);
final int numPartitions = partitioner.numPartitions();
final long[] partitionLengths = new long[numPartitions];
final FileChannel[] spillInputChannels = new FileChannel[spills.length];
final long[] spillInputChannelPositions = new long[spills.length];
FileChannel mergedFileOutputChannel = null;
boolean threwException = true;
try {
//为每个spill file创建输入流,并获取输入流对应的通道
for (int i = 0; i < spills.length; i++) {
spillInputChannels[i] = new FileInputStream(spills[i].file).getChannel();
}
// This file needs to opened in append mode in order to work around a Linux kernel bug that
// affects transferTo; see SPARK-3948 for more details.
//为最终的输出文件outputFile创建输出流,并获取输出流对应的通道
//输出文件需要以追加模式打开
mergedFileOutputChannel = new FileOutputStream(outputFile, true).getChannel();
long bytesWrittenToMergedFile = 0;
//外循环遍历partition
for (int partition = 0; partition < numPartitions; partition++) {
//内循环遍历spill file
for (int i = 0; i < spills.length; i++) {
final long partitionLengthInSpill = spills[i].partitionLengths[partition];
final FileChannel spillInputChannel = spillInputChannels[i];
final long writeStartTime = System.nanoTime();
//输入流对应的FileChannel调用transferTo方法,将字节转移到输出流对应的FileChannel
Utils.copyFileStreamNIO(
spillInputChannel,
mergedFileOutputChannel,
spillInputChannelPositions[i],
partitionLengthInSpill);
spillInputChannelPositions[i] += partitionLengthInSpill;
writeMetrics.incWriteTime(System.nanoTime() - writeStartTime);
bytesWrittenToMergedFile += partitionLengthInSpill;
partitionLengths[partition] += partitionLengthInSpill;
}
}
// Check the position after transferTo loop to see if it is in the right position and raise an
// exception if it is incorrect. The position will not be increased to the expected length
// after calling transferTo in kernel version 2.6.32. This issue is described at
// https://bugs.openjdk.java.net/browse/JDK-7052359 and SPARK-3948.
if (mergedFileOutputChannel.position() != bytesWrittenToMergedFile) {
throw new IOException(
"Current position " + mergedFileOutputChannel.position() + " does not equal expected " +
"position " + bytesWrittenToMergedFile + " after transferTo. Please check your kernel" +
" version to see if it is 2.6.32, as there is a kernel bug which will lead to " +
"unexpected behavior when using transferTo. You can set spark.file.transferTo=false " +
"to disable this NIO feature."
);
}
threwException = false;
} finally {
// To avoid masking exceptions that caused us to prematurely enter the finally block, only
// throw exceptions during cleanup if threwException == false.
for (int i = 0; i < spills.length; i++) {
assert(spillInputChannelPositions[i] == spills[i].file.length());
Closeables.close(spillInputChannels[i], threwException);
}
Closeables.close(mergedFileOutputChannel, threwException);
}
return partitionLengths;
}