Spark-partitioner

Spark-partitioner

@(spark)[partitioner]

Partitioner

/**                                                                                                                                                                     
 * An object that defines how the elements in a key-value pair RDD are partitioned by key.                                                                              
 * Maps each key to a partition ID, from 0 to `numPartitions - 1`.                                                                                                      
 */                                                                                                                                                                     
abstract class Partitioner extends Serializable {                                                                                                                       
  def numPartitions: Int                                                                                                                                                
  def getPartition(key: Any): Int                                                                                                                                       
}  

HashPartitioner

/**                                                                                                                                                                     
 * A [[org.apache.spark.Partitioner]] that implements hash-based partitioning using                                                                                     
 * Java's `Object.hashCode`.                                                                                                                                            
 *                                                                                                                                                                      
 * Java arrays have hashCodes that are based on the arrays' identities rather than their contents,                                                                      
 * so attempting to partition an RDD[Array[_]] or RDD[(Array[_], _)] using a HashPartitioner will                                                                       
 * produce an unexpected or incorrect result.                                                                                                                           
 */                                                                                                                                                                     
class HashPartitioner(partitions: Int) extends Partitioner {   

RangePartitioner

实际上这个用于sort base的partition
1. 取个sample,得到大概的数据分布
2. 每个key,根据上面的sample确定partition

/**                                                                                                                                                                     
 * A [[org.apache.spark.Partitioner]] that partitions sortable records by range into roughly                                                                            
 * equal ranges. The ranges are determined by sampling the content of the RDD passed in.                                                                                
 *                                                                                                                                                                      
 * Note that the actual number of partitions created by the RangePartitioner might not be the same                                                                      
 * as the `partitions` parameter, in the case where the number of sampled records is less than                                                                          
 * the value of `partitions`.                                                                                                                                           
 */                                                                                                                                                                     
class RangePartitioner[K : Ordering : ClassTag, V]( 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值