repartitionAndSortWithinPartitions是什么?
源码
/** 包路径:package org.apache.spark.rdd.OrderedRDDFunctions * Repartition the RDD according to the given partitioner and, within each resulting partition,
本文介绍了Spark中的repartitionAndSortWithinPartitions操作,该操作结合了重新分区和排序,提高了效率。源码解析表明它在每个分区内部按键排序记录。当需要重分区并确保数据升序排序时,应使用此方法。通过举例说明,展示了如何使用此功能以提升性能,避免单独使用repartition和sortBy。
/** 包路径:package org.apache.spark.rdd.OrderedRDDFunctions * Repartition the RDD according to the given partitioner and, within each resulting partition,
3814
1218
2337
262
3594