spark学习-37-Spark的SortShuffleManager

最新推荐文章于 2023-01-13 18:35:45 发布

九师兄

最新推荐文章于 2023-01-13 18:35:45 发布

阅读量764

点赞数

分类专栏：大数据-spark 文章标签： spark Sort shuffle manager

本文为博主九师兄（QQ:541711153 欢迎来探讨技术）原创文章，未经允许博主不允许转载。

本文链接：https://blog.csdn.net/qq_21383435/article/details/78634678

版权

大数据-spark 专栏收录该内容

204 篇文章 478 订阅 ¥49.90 ¥99.00

订阅专栏

在这里插入图片描述

本文主要参考：
https://www.cnblogs.com/arachis/p/Spark_Shuffle.html
https://zhuanlan.zhihu.com/p/22024169



package org.apache.spark.shuffle.sort

import java.util.concurrent.ConcurrentHashMap

import org.apache.spark._
import org.apache.spark.internal.Logging
import org.apache.spark.shuffle._

/**
 * In sort-based shuffle, incoming records are sorted according to their target partition ids, then
 * written to a single map output file. Reducers fetch contiguous regions of this file in order to
 * read their portion of the map output. In cases where the map output data is too large to fit in
 * memory, sorted subsets of the output can are spilled to disk and those on-disk files are merged
 * to produce the final output file.
  *
  * 在基于排序的shuffle中，传入记录按照目标分区id排序，然后写入单个映射输出文件。还原器获取该文件的连续区域，
  * 以读取它们的部分映射输出。如果映射的输出数据太大࿰

了解本专栏