本文主要参考:
https://www.cnblogs.com/arachis/p/Spark_Shuffle.html
https://zhuanlan.zhihu.com/p/22024169
package org.apache.spark.shuffle.sort
import java.util.concurrent.ConcurrentHashMap
import org.apache.spark._
import org.apache.spark.internal.Logging
import org.apache.spark.shuffle._
/**
* In sort-based shuffle, incoming records are sorted according to their target partition ids, then
* written to a single map output file. Reducers fetch contiguous regions of this file in order to
* read their portion of the map output. In cases where the map output data is too large to fit in
* memory, sorted subsets of the output can are spilled to disk and those on-disk files are merged
* to produce the final output file.
*
* 在基于排序的shuffle中,传入记录按照目标分区id排序,然后写入单个映射输出文件。还原器获取该文件的连续区域,
* 以读取它们的部分映射输出。如果映射的输出数据太大