语法
val newRdd = oldRdd1.partitionBy(new org.apache.spark.HashPartitioner(partitions))
partitions 表示分区数
源码
def partitionBy(partitioner : org.apache.spark.Partitioner) : org.apache.spark.rdd.RDD[scala.Tuple2[K, V]] = { /* compiled code */ }
作用
对K-V类型的RDD重新分配分区。
例子
package com.day1
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object oper {
def main(args: Array[String]): Unit = {
val config:SparkConf = new SparkConf().setMaster("local[*]").setAppName("wordCount")
// 创建上下文对象
val sc = new SparkContext(config)
// partitionBy算子
val arrayRdd = sc.makeRDD(Array((1,"张三"),(2,"李四"),(3,"王五"),(4,"刘六")),4)
val partitionByRdd = arrayRdd.partitionBy(new org.apache.spark.HashPartitioner(2))
println(partitionByRdd.getNumPartitions)
}
}
输入
(1,"张三"),(2,"李四"),(3,"王五"),(4,"刘六")
输出
2