hadoop文件存储位置_hadoop文件存储位置

Spark 源码分析之ShuffleMapTask内存数据Spill和合并(文档详解):https://github.com/opensourceteams/spark-scala-maven/blob/master/md/ShuffleMapTaskSpillDiskFile.md

Spark 源码分析之ShuffleMapTask内存数据Spill和合并更多资源分享SPARK 源码分析技术分享(视频汇总套装视频): https://www.bilibili.com/video/av37442139/github: https://github.com/opensourceteams/spark-scala-mavencsdn(汇总视频在线看): https://blog.csdn.net/thinktothings/article/details/84726769前置条件Hadoop版本: Hadoop 2.6.0-cdh5.15.0Spark版本: SPARK 1.6.0-cdh5.15.0JDK.1.8.0_191scala2.10.7技能标签Spark ShuffleMapTask 内存中的数据Spill到临时文件临时文件中的数据是如何定入的,如何按partition升序排序,再按Key升序排序写入(key,value)数据每个临时文件,都存入对应的每个分区有多少个(key,value)对,有多少次流提交数组,数组中保留每次流的大小如何把临时文件合成一个文件如何把内存中的数据和临时文件,进行分区,按key,排序后,再写入合并文件中内存中数据Spill到磁盘ShuffleMapTask进行当前分区的数据读取(此时读的是HDFS的当前分区,注意还有一个reduce分区,也就是ShuffleMapTask输出文件是已经按Reduce分区处理好的)SparkEnv指定默认的SortShuffleManager,getWriter()中匹配BaseShuffleHandle对象,返回SortShuffleWriter对象SortShuffleWriter,用的是ExternalSorter(外部排序对象进行排序处理),会把rdd.iterator(partition, context)的数据通过iterator插入到ExternalSorter中PartitionedAppendOnlyMap对象中做为内存中的map对象数据,每插入一条(key,value)的数据后,会对当前的内存中的集合进行判断,如果满足溢出文件的条件,就会把内存中的数据写入到SpillFile文件中满中溢出文件的条件是,每插入32条数据,并且,当前集合中的数据估值大于等于5m时,进行一次判断,会通过算法验证对内存的影响,确定是否可以溢出内存中的数据到文件,如果满足就把当前内存中的所有数据写到磁盘spillFile文件中SpillFile调用org.apache.spark.util.collection.ExternalSorter.SpillableIterator.spill()方法处理WritablePartitionedIterator迭代对象对内存中的数据进行迭代,DiskBlockObjectWriter对象写入磁盘,写入的数据格式为(key,value),不带partition的ExternalSorter.spillMemoryIteratorToDisk()这个方法将内存数据迭代对象WritablePartitionedIterator写入到一个临时文件,SpillFile临时文件用DiskBlockObjectWriter对象来写入数据临时文件的格式temp_local_+UUID遍历内存中的数据写入到临时文件,会记录每个临时文件中每个分区的(key,value)各有多少个,elementsPerPartition(partitionId) += 1 如果说数据很大的话,会每默认每10000条数据进行Flush()一次数据到文件中,会记录每一次Flush的数据大小batchSizes入到ArrayBuffer中保存并且在数据写入前,会进行排序,先按key的hash分区,先按partition的升序排序,再按key的升序排序,这样来写入文件中,以保证读取临时文件时可以分隔开每个临时文件的每个分区的数据,对于一个临时文件中一个分区的数据量比较大的话,会按流一批10000个(key,value)进行读取,读取的大小讯出在batchSizes数据中,就样读取的时候就非常方便了内存数据Spill和合并把数据insertAll()到ExternalSorter中,完成后,此时如果数据大的话,会进行溢出到临时文件的操作,数据写到临时文件后把当前内存中的数据和临时文件中的数据进行合并数据文件,合并后的文件只包含(key,value),并且是按partition升序排序,然后按key升序排序,输出文件名称:ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID) + UUID 即:"shuffle_" + shuffleId + "" + mapId + "" + reduceId + ".data" + UUID,reduceId为默认值0还会有一份索引文件: "shuffle_" + shuffleId + "" + mapId + "" + reduceId + ".index" + "." +UUID,索引文件依次存储每个partition的位置偏移量数据文件的写入分两种情况,一种是直接内存写入,没有溢出临时文件到磁盘中,这种是直接在内存中操作的(数据量相对小些),另外单独分析一种是有磁盘溢出文件的,这种情况是本文重点分析的情况ExternalSorter.partitionedIterator()方法可以处理所有磁盘中的临时文件和内存中的文件,返回一个可迭代的对象,里边放的元素为reduce用到的(partition,Iterator(key,value)),迭代器中的数据是按key升序排序的具体是通过ExternalSorter.mergeWithAggregation(),遍历每一个临时文件中当前partition的数据和内存中当前partition的数据,注意,临时文件数据读取时是按partition为0开始依次遍历的源码分析(内存中数据Spill到磁盘)ShuffleMapTask调用ShuffleMapTask.runTask()方法处理当前HDFS分区数据

调用SparkEnv.get.shuffleManager得到SortShuffleManager

SortShuffleManager.getWriter()得到SortShuffleWriter

调用SortShuffleWriter.write()方法

SparkEnv.create()

val shortShuffleMgrNames = Map(

"hash" -> "org.apache.spark.shuffle.hash.HashShuffleManager",

"sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager",

"tungsten-sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager")

val shuffleMgrName = conf.get("spark.shuffle.manager", "sort")

val shuffleMgrClass = shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase, shuffleMgrName)

val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass)

override def runTask(context: TaskContext): MapStatus = {

// Deserialize the RDD using the broadcast variable.

val deserializeStartTime = System.currentTimeMillis()

val ser = SparkEnv.get.closureSerializer.newInstance()

val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](

ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)

_executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime

metrics = Some(context.taskMetrics)

var writer: ShuffleWriter[Any, Any] = null

try {

val manager = SparkEnv.get.shuffleManager

writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)

writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: product2 any>

writer.stop(success = true).get

} catch {

case e: Exception =>

try {

if (writer != null) {

writer.stop(success = false)

}

} catch {

case e: Exception =>

log.debug("Could not stop writer", e)

}

throw e

}

}

SortShuffleWriter调用SortShuffleWriter.write()方法根据RDDDependency中mapSideCombine是否在map端合并,这个是由算子决定,reduceByKey中mapSideCombine为true,groupByKey中mapSideCombine为false,会new ExternalSorter()外部排序对象进行排序然后把records中的数据插入ExternalSorter对象sorter中,数据来源是HDFS当前的分区/* Write a bunch of records to this task's output / override def write(records: Iterator[Product2[K, V]]): Unit = {

sorter = if (dep.mapSideCombine) {

require(dep.aggregator.isDefined, "Map-side combine without Aggregator specified!")

new ExternalSorter[K, V, C](

context, dep.aggregator, Some(dep.partitioner), dep.keyOrdering, dep.serializer)

} else {

// In this case we pass neither an aggregator nor an ordering to the sorter, because we don't

// care whether the keys get sorted in each partition; that will be done on the reduce side

// if the operation being run is sortByKey.

new ExternalSorter[K, V, V](

context, aggregator = None, Some(dep.partitioner), ordering = None, dep.serializer)

}

sorter.insertAll(records)

// Don't bother including the time to open the merged output file in the shuffle write time,

// because it just opens a single file, so is typically too fast to measure accurately

// (see SPARK-3570).

val output = shuffleBlockResolver.getDataFile(dep.shuffleId, mapId)

val tmp = Utils.tempFileWith(output)

try {

val blockId = ShuffleBlockId(dep.shuffleId, mapId, IndexShuffleBlockResolver.NOOP_REDUCE_ID)

val partitionLengths = sorter.writePartitionedFile(blockId, tmp)

shuffleBlockResolver.writeIndexFileAndCommit(dep.shuffleId, mapId, partitionLengths, tmp)

mapStatus = MapStatus(blockManager.shuffleServerId, partitionLengths)

} finally {

if (tmp.exists() && !tmp.delete()) {

logError(s"Error while deleting temp file ${tmp.getAbsolutePath}")

}

}

}ExternalSorter.insertAll()方法该方法会把迭代器records中的数据插入到外部排序对象中ExternalSorter中的数据是不进行排序的,是以数组的形式存储的,健存的为(partition,key),值为Shuffle之前的RDD链计算结果 在内存中会对相同的key,进行合并操作,就是map端本地合并,合并的函数就是reduceByKey(+)这个算子中定义的函数maybeSpillCollection方法会判断是否满足磁盘溢出到临时文件,满足条件,会把当前内存中的数据写到磁盘中,写到磁盘中的数据是按partition升序排序,再按key升序排序,就是(key,value)的临时文件,不带partition,但是会记录每个分区的数量elementsPerPartition(partitionId- 记录每一次Flush的数据大小batchSizes入到ArrayBuffer中保存内存中的数据存在PartitionedAppendOnlyMap,记住这个对象,后面排序用到了这个里边的排序算法@volatile private var map = new PartitionedAppendOnlyMap[K, C]

def insertAll(records: Iterator[Product2[K, V]]): Unit = {

// TODO: stop combining if we find that the reduction factor isn't high

val shouldCombine = aggregator.isDefined

if (shouldCombine) {

// Combine values in-memory first using our AppendOnlyMap

val mergeValue = aggregator.get.mergeValue

val createCombiner = aggregator.get.createCombiner

var kv: Product2[K, V] = null

val update = (hadValue: Boolean, oldValue: C) => {

if (hadValue) mergeValue(oldValue, kv._2) else createCombiner(kv._2)

}

while (records.hasNext) {

addElementsRead()

kv = records.next()

map.changeValue((getPartition(kv._1), kv._1), update)

maybeSpillCollection(usingMap = true)

}

} else {

// Stick values into our buffer

while (records.hasNext) {

addElementsRead()

val kv = records.next()

buffer.insert(getPartition(kv._1), kv._1, kv._2.asInstanceOf[C])

maybeSpillCollection(usingMap = false)

}

}

}

ExternalSorter.maybeSpillCollectionestimatedSize当前内存中数据预估占内存大小maybeSpill满足Spill条件就把内存中的数据写入到临时文件中调用ExternalSorter.maybeSpill()/**

Spill the current in-memory collection to disk if needed.*

@param usingMap whether we're using a map or buffer as our current in-memory collection*/

private def maybeSpillCollection(usingMap: Boolean): Unit = {

var estimatedSize = 0L

if (usingMap) {

estimatedSize = map.estimateSize()

if (maybeSpill(map, estimatedSize)) {

map = new PartitionedAppendOnlyMap[K, C]

}

} else {

estimatedSize = buffer.estimateSize()

if (maybeSpill(buffer, estimatedSize)) {

buffer = new PartitionedPairBuffer[K, C]

}

}

if (estimatedSize > _peakMemoryUsedBytes) {

_peakMemoryUsedBytes = estimatedSize

}

}ExternalSorter.maybeSpill()对内存中的数据遍历时,每遍历32个元素,进行判断,当前内存是否大于5m,如果大于5m,再进行内存的计算,如果满足就把内存中的数据写到临时文件中如果满足条件,调用ExternalSorter.spill()方法,将内存中的数据写入临时文件

/**

Spills the current in-memory collection to disk if needed. Attempts to acquire more

memory before spilling.*

@param collection collection to spill to disk

@param currentMemory estimated size of the collection in bytes

@return true if collection was spilled to disk; false otherwise*/

protected def maybeSpill(collection: C, currentMemory: Long): Boolean = {

var shouldSpill = false

if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) {

// Claim up to double our current memory from the shuffle memory pool

val amountToRequest = 2 * currentMemory - myMemoryThreshold

val granted = acquireOnHeapMemory(amountToRequest)

myMemoryThreshold += granted

// If we were granted too little memory to grow further (either tryToAcquire returned 0,

// or we already had more memory than myMemoryThreshold), spill the current collection

shouldSpill = currentMemory >= myMemoryThreshold

}

shouldSpill = shouldSpill || _elementsRead > numElementsForceSpillThreshold

// Actually spill

if (shouldSpill) {

_spillCount += 1

logSpillage(currentMemory)

spill(collection)

_elementsRead = 0

_memoryBytesSpilled += currentMemory

releaseMemory()

}

shouldSpill

}

ExternalSorter.spill()调用方法collection.destructiveSortedWritablePartitionedIterator进行排序,即调用PartitionedAppendOnlyMap.destructiveSortedWritablePartitionedIterator进行排序()方法排序,最终会调用WritablePartitionedPairCollection.destructiveSortedWritablePartitionedIterator()排序,调用方法WritablePartitionedPairCollection.partitionedDestructiveSortedIterator(),没有实现,调用子类PartitionedAppendOnlyMap.partitionedDestructiveSortedIterator()方法调用方法ExternalSorter.spillMemoryIteratorToDisk() 将磁盘中的数据写入到spillFile临时文件中 /**

Spill our in-memory collection to a sorted file that we can merge later.

We add this file into spilledFiles to find it later.*

@param collection whichever collection we're using (map or buffer)*/

override protected[this] def spill(collection: WritablePartitionedPairCollection[K, C]): Unit = {

val inMemoryIterator = collection.destructiveSortedWritablePartitionedIterator(comparator)

val spillFile = spillMemoryIteratorToDisk(inMemoryIterator)

spills.append(spillFile)

}PartitionedAppendOnlyMap.partitionedDestructiveSortedIterator()调用排序算法WritablePartitionedPairCollection.partitionKeyComparator即先按分区数的升序排序,再按key的升序排序/**

Implementation of WritablePartitionedPairCollection that wraps a map in which the keys are tuples

of (partition ID, K)*/

private[spark] class PartitionedAppendOnlyMap[K, V] extends SizeTrackingAppendOnlyMap[(Int, K), V] with WritablePartitionedPairCollection[K, V] {

def partitionedDestructiveSortedIterator(keyComparator: Option[Comparator[K]])

: Iterator[((Int, K), V)] = {

val comparator = keyComparator.map(partitionKeyComparator).getOrElse(partitionComparator)

destructiveSortedIterator(comparator)

}

def insert(partition: Int, key: K, value: V): Unit = {

update((partition, key), value)

}}

/**

A comparator for (Int, K) pairs that orders them both by their partition ID and a key ordering.*/

def partitionKeyComparatorK: Comparator[(Int, K)] = {

new Comparator[(Int, K)] {

override def compare(a: (Int, K), b: (Int, K)): Int = {

val partitionDiff = a._1 - b._1

if (partitionDiff != 0) {

partitionDiff

} else {

keyComparator.compare(a._2, b._2)

}

}

}

}}ExternalSorter.spillMemoryIteratorToDisk()创建blockId : temp_shuffle_ + UUID溢出到磁盘临时文件: temp_shuffle_ + UUID遍历内存数据inMemoryIterator写入到磁盘临时文件spillFile遍历内存中的数据写入到临时文件,会记录每个临时文件中每个分区的(key,value)各有多少个,elementsPerPartition(partitionId) 如果说数据很大的话,会每默认每10000条数据进行Flush()一次数据到文件中,会记录每一次Flush的数据大小batchSizes入到ArrayBuffer中保存/**

Spill contents of in-memory iterator to a temporary file on disk.*/

private[this] def spillMemoryIteratorToDisk(inMemoryIterator: WritablePartitionedIterator)

: SpilledFile = {

// Because these files may be read during shuffle, their compression must be controlled by

// spark.shuffle.compress instead of spark.shuffle.spill.compress, so we need to use

// createTempShuffleBlock here; see SPARK-3426 for more context.

val (blockId, file) = diskBlockManager.createTempShuffleBlock()

// These variables are reset after each flush

var objectsWritten: Long = 0

var spillMetrics: ShuffleWriteMetrics = null

var writer: DiskBlockObjectWriter = null

def openWriter(): Unit = {

assert (writer == null && spillMetrics == null)

spillMetrics = new ShuffleWriteMetrics

writer = blockManager.getDiskWriter(blockId, file, serInstance, fileBufferSize, spillMetrics)

}

openWriter()

// List of batch sizes (bytes) in the order they are written to disk

val batchSizes = new ArrayBuffer[Long]

// How many elements we have in each partition

val elementsPerPartition = new Array[Long](numPartitions)

// Flush the disk writer's contents to disk, and update relevant variables.

// The writer is closed at the end of this process, and cannot be reused.

def flush(): Unit = {

val w = writer

writer = null

w.commitAndClose()

_diskBytesSpilled += spillMetrics.shuffleBytesWritten

batchSizes.append(spillMetrics.shuffleBytesWritten)

spillMetrics = null

objectsWritten = 0

}

var success = false

try {

while (inMemoryIterator.hasNext) {

val partitionId = inMemoryIterator.nextPartition()

require(partitionId >= 0 && partitionId < numPartitions,

s"partition Id: ${partitionId} should be in the range [0, ${numPartitions})")

inMemoryIterator.writeNext(writer)

elementsPerPartition(partitionId) += 1

objectsWritten += 1

if (objectsWritten == serializerBatchSize) {

flush()

openWriter()

}

}

if (objectsWritten > 0) {

flush()

} else if (writer != null) {

val w = writer

writer = null

w.revertPartialWritesAndClose()

}

success = true

} finally {

if (!success) {

// This code path only happens if an exception was thrown above before we set success;

// close our stuff and let the exception be thrown further

if (writer != null) {

writer.revertPartialWritesAndClose()

}

if (file.exists()) {

if (!file.delete()) {

logWarning(s"Error deleting ${file}")

}

}

}

}

SpilledFile(file, blockId, batchSizes.toArray, elementsPerPartition)

}

源码分析(内存数据Spill合并)SortShuffleWriter.insertAll即内存中的数据,如果有溢出,写入到临时文件后,可能会有多个临时文件(看数据的大小)

这时要开始从所有的临时文件中,shuffle出按给reduce输入数据(partition,Iterator),相当于要对多个临时文件进行合成一个文件,合成的结果按partition升序排序,再按Key升序排序

SortShuffleWriter.write

得到合成文件shuffleBlockResolver.getDataFile : 格式如 "shuffle_" + shuffleId + "" + mapId + "" + reduceId + ".data" + "." + UUID,reduceId为默认的0

调用关键方法ExternalSorter的sorter.writePartitionedFile,这才是真正合成文件的方法

返回值partitionLengths,即为数据文件中对应索引文件按分区从0到最大分区,每个分区的数据大小的数组

/* Write a bunch of records to this task's output / override def write(records: Iterator[Product2[K, V]]): Unit = {

sorter = if (dep.mapSideCombine) {

require(dep.aggregator.isDefined, "Map-side combine without Aggregator specified!")

new ExternalSorter[K, V, C](

context, dep.aggregator, Some(dep.partitioner), dep.keyOrdering, dep.serializer)

} else {

// In this case we pass neither an aggregator nor an ordering to the sorter, because we don't

// care whether the keys get sorted in each partition; that will be done on the reduce side

// if the operation being run is sortByKey.

new ExternalSorter[K, V, V](

context, aggregator = None, Some(dep.partitioner), ordering = None, dep.serializer)

}

sorter.insertAll(records)

// Don't bother including the time to open the merged output file in the shuffle write time,

// because it just opens a single file, so is typically too fast to measure accurately

// (see SPARK-3570).

val output = shuffleBlockResolver.getDataFile(dep.shuffleId, mapId)

val tmp = Utils.tempFileWith(output)

try {

val blockId = ShuffleBlockId(dep.shuffleId, mapId, IndexShuffleBlockResolver.NOOP_REDUCE_ID)

val partitionLengths = sorter.writePartitionedFile(blockId, tmp)

shuffleBlockResolver.writeIndexFileAndCommit(dep.shuffleId, mapId, partitionLengths, tmp)

mapStatus = MapStatus(blockManager.shuffleServerId, partitionLengths)

} finally {

if (tmp.exists() && !tmp.delete()) {

logError(s"Error while deleting temp file ${tmp.getAbsolutePath}")

}

}

}

ExternalSorter.writePartitionedFile按方法名直译,把数据写入已分区的文件中如果没有spill文件,直接按ExternalSorter在内存中排序,用的是TimSort排序算法排序,单独合出来讲,这里不详细讲如果有spill文件,是我们重点分析的,这个时候,调用this.partitionedIterator按回按[(partition,Iterator)],按分区升序排序,按(key,value)中key升序排序的数据,并键中方法this.partitionedIterator()写入合并文件中,并返回写入合并文件中每个分区的长度,放到lengths数组中,数组索引就是partition/**

Write all the data added into this ExternalSorter into a file in the disk store. This is

called by the SortShuffleWriter.*

@param blockId block ID to write to. The index file will be blockId.name + ".index".

@return array of lengths, in bytes, of each partition of the file (used by map output tracker)*/

def writePartitionedFile(

blockId: BlockId,

outputFile: File): Array[Long] = {

// Track location of each range in the output file

val lengths = new Array[Long](numPartitions)

if (spills.isEmpty) {

// Case where we only have in-memory data

val collection = if (aggregator.isDefined) map else buffer

val it = collection.destructiveSortedWritablePartitionedIterator(comparator)

while (it.hasNext) {

val writer = blockManager.getDiskWriter(blockId, outputFile, serInstance, fileBufferSize,

context.taskMetrics.shuffleWriteMetrics.get)

val partitionId = it.nextPartition()

while (it.hasNext && it.nextPartition() == partitionId) {

it.writeNext(writer)

}

writer.commitAndClose()

val segment = writer.fileSegment()

lengths(partitionId) = segment.length

}

} else {

// We must perform merge-sort; get an iterator by partition and write everything directly.

for ((id, elements)

if (elements.hasNext) {

val writer = blockManager.getDiskWriter(blockId, outputFile, serInstance, fileBufferSize,

context.taskMetrics.shuffleWriteMetrics.get)

for (elem

writer.write(elem._1, elem._2)

}

writer.commitAndClose()

val segment = writer.fileSegment()

lengths(id) = segment.length

}

}

}

context.taskMetrics().incMemoryBytesSpilled(memoryBytesSpilled)

context.taskMetrics().incDiskBytesSpilled(diskBytesSpilled)

context.internalMetricsToAccumulators(

InternalAccumulator.PEAK_EXECUTION_MEMORY).add(peakMemoryUsedBytes)

lengths

}

this.partitionedIterator()直接调用ExternalSorter.merge()方法临时文件参数spills内存文件排序算法在这里调用collection.partitionedDestructiveSortedIterator(comparator),实际调的是PartitionedAppendOnlyMap.partitionedDestructiveSortedIterator,定义了排序算法partitionKeyComparator,即按partition升序排序,再按key升序排序/**

Return an iterator over all the data written to this object, grouped by partition and

aggregated by the requested aggregator. For each partition we then have an iterator over its

contents, and these are expected to be accessed in order (you can't "skip ahead" to one

partition without reading the previous one). Guaranteed to return a key-value pair for each

partition, in order of partition ID.*

For now, we just merge all the spilled files in once pass, but this can be modified to

support hierarchical merging.

Exposed for testing.*/

def partitionedIterator: Iterator[(Int, Iterator[Product2[K, C]])] = {

val usingMap = aggregator.isDefined

val collection: WritablePartitionedPairCollection[K, C] = if (usingMap) map else buffer

if (spills.isEmpty) {

// Special case: if we have only in-memory data, we don't need to merge streams, and perhaps

// we don't even need to sort by anything other than partition ID

if (!ordering.isDefined) {

// The user hasn't requested sorted keys, so only sort by partition ID, not key

groupByPartition(destructiveIterator(collection.partitionedDestructiveSortedIterator(None)))

} else {

// We do need to sort by both partition ID and key

groupByPartition(destructiveIterator(

collection.partitionedDestructiveSortedIterator(Some(keyComparator))))

}

} else {

// Merge spilled and in-memory data

merge(spills, destructiveIterator(

collection.partitionedDestructiveSortedIterator(comparator)))

}

}

ExternalSorter.merge()方法0 until numPartitions 从0到numPartitions(不包含)分区循环调用IteratorForPartition(p, inMemBuffered),每次取内存中的p分区的数据readers是每个分区是读所有的临时文件(因为每份临时文件,都有可能包含p分区的数据),readers.map(_.readNextPartition())该方法内部用的是每次调一个分区的数据,从0开始,刚好对应的是p分区的数据readNextPartition方法即调用SpillReader.readNextPartition()方法对p分区的数据进行mergeWithAggregation合并后,再写入到合并文件中 /**

Merge a sequence of sorted files, giving an iterator over partitions and then over elements

inside each partition. This can be used to either write out a new file or return data to

the user.*

Returns an iterator over all the data written to this object, grouped by partition. For each

partition we then have an iterator over its contents, and these are expected to be accessed

in order (you can't "skip ahead" to one partition without reading the previous one).

Guaranteed to return a key-value pair for each partition, in order of partition ID.*/

private def merge(spills: Seq[SpilledFile], inMemory: Iterator[((Int, K), C)])

: Iterator[(Int, Iterator[Product2[K, C]])] = {

val readers = spills.map(new SpillReader(_))

val inMemBuffered = inMemory.buffered

(0 until numPartitions).iterator.map { p =>

val inMemIterator = new IteratorForPartition(p, inMemBuffered)

val iterators = readers.map(_.readNextPartition()) ++ Seq(inMemIterator)

if (aggregator.isDefined) {

// Perform partial aggregation across partitions

(p, mergeWithAggregation(

iterators, aggregator.get.mergeCombiners, keyComparator, ordering.isDefined))

} else if (ordering.isDefined) {

// No aggregator given, but we have an ordering (e.g. used by reduce tasks in sortByKey);

// sort the elements without trying to merge them

(p, mergeSort(iterators, ordering.get))

} else {

(p, iterators.iterator.flatten)

}

}

}

SpillReader.readNextPartition()readNextItem()是真正读数临时文件的方法,deserializeStream每次读取一个流大小,这个大小时在spill输出文件时写到batchSizes中的,某个是每个分区写一次流,如果分区中的数据很大,就按10000条数据进行一次流,这样每满10000次就再读一次流,这样就可以把当前分区里边的多少提交流全部读完一进来就执行nextBatchStream()方法,该方法是按数组batchSizes存储着每次写入流时的数据大小val batchOffsets = spill.serializerBatchSizes.scanLeft(0L)(_ + _)这个其实取到的值,就刚好是每次流的一位置偏移量,后面的偏移量,刚好是前面所有偏移量之和当前分区的流读完时,就为空,就相当于当前分区的数据全部读完了当partitionId=numPartitions,finished= true说明所有分区的所有文件全部读完了def readNextPartition(): Iterator[Product2[K, C]] = new Iterator[Product2[K, C]] {

val myPartition = nextPartitionToRead

nextPartitionToRead += 1

override def hasNext: Boolean = {

if (nextItem == null) {

nextItem = readNextItem()

if (nextItem == null) {

return false

}

}

assert(lastPartitionId >= myPartition)

// Check that we're still in the right partition; note that readNextItem will have returned

// null at EOF above so we would've returned false there

lastPartitionId == myPartition

}

override def next(): Product2[K, C] = {

if (!hasNext) {

throw new NoSuchElementException

}

val item = nextItem

nextItem = null

item

}

}

/**

* Return the next (K, C) pair from the deserialization stream and update partitionId,

* indexInPartition, indexInBatch and such to match its location.

*

* If the current batch is drained, construct a stream for the next batch and read from it.

* If no more pairs are left, return null.

*/

private def readNextItem(): (K, C) = {

if (finished || deserializeStream == null) {

return null

}

val k = deserializeStream.readKey().asInstanceOf[K]

val c = deserializeStream.readValue().asInstanceOf[C]

lastPartitionId = partitionId

// Start reading the next batch if we're done with this one

indexInBatch += 1

if (indexInBatch == serializerBatchSize) {

indexInBatch = 0

deserializeStream = nextBatchStream()

}

// Update the partition location of the element we're reading

indexInPartition += 1

skipToNextPartition()

// If we've finished reading the last partition, remember that we're done

if (partitionId == numPartitions) {

finished = true

if (deserializeStream != null) {

deserializeStream.close()

}

}

(k, c)

}

/* Construct a stream that only reads from the next batch /

def nextBatchStream(): DeserializationStream = {

// Note that batchOffsets.length = numBatches + 1 since we did a scan above; check whether

// we're still in a valid batch.

if (batchId < batchOffsets.length - 1) {

if (deserializeStream != null) {

deserializeStream.close()

fileStream.close()

deserializeStream = null

fileStream = null

}

val start = batchOffsets(batchId)

fileStream = new FileInputStream(spill.file)

fileStream.getChannel.position(start)

batchId += 1

val end = batchOffsets(batchId)

assert(end >= start, "start = " + start + ", end = " + end +

", batchOffsets = " + batchOffsets.mkString("[", ", ", "]"))

val bufferedStream = new BufferedInputStream(ByteStreams.limit(fileStream, end - start))

val sparkConf = SparkEnv.get.conf

val stream = blockManager.wrapForCompression(spill.blockId,

CryptoStreamUtils.wrapForEncryption(bufferedStream, sparkConf))

serInstance.deserializeStream(stream)

} else {

// No more batches left

cleanup()

null

}

}

end

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值