spark streaming中shuffling后partition数量

spark streaming中shuffle后partition数量

使用reduceByKey时候,在shuffle阶段的reduce时候,其使用的RDD的partation数量的源码解释如下:

/**
   * Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
   * merged using the associative and commutative reduce function. Hash partitioning is used to
   * generate the RDDs with Spark's default number of partitions.
   */
  def reduceByKey(func: JFunction2[V, V, V]): JavaPairDStream[K, V] =
    dstream.reduceByKey(func)

  /**
   * Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
   * merged using the supplied reduce function. Hash partitioning is used to generate the RDDs
   * with `numPartitions` partitions.
   */
  def reduceByKey(func: JFunction2[V, V, V], numPartitions: Int): JavaPairDStream[K, V] =
    dstream.reduceByKey(func, numPartitions)

源码解读:reduceByKey(func: JFunction2[V, V, V])使用只有一个参数的reduceByKey函数时候,其Hash partitioning的数量使用default number of partitions;如果指定第二个参数numPartitions,则表示使用提供的numPartitions参数作为shuffle时候reduce的数量(即rdd中分区数量);

【很多时候,不知道API中函数背后运行机制,就看该函数的源码解释,不同形参列表表示的含义】

所谓使用默认的reduce端default number of partitions,参考这篇文章

https://blog.csdn.net/bbaiggey/article/details/51984753

http://spark.apache.org/docs/latest/configuration.html

具体的Execution Behavior参数列表配置如下

Property NameDefaultMeaning
spark.default.parallelismFor distributed shuffle operations like reduceByKeyand join, the largest number of partitions in a parent RDD. For operations like parallelizewith no parent RDDs, it depends on the cluster manager:Local mode: number of cores on the local machineMesos fine grained mode: 8Others: total number of cores on all executor nodes or 2, whichever is largerDefault number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值