Flink Physical partitioning(物理分区)
Rebalancing (Round-robin partitioning) 默认策略
轮询,会将数据轮询发送给下游任务
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.rebalance
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
Random partitioning
随机将数据发送给下游
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.shuffle
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
Rescaling
上游分区的数据 会 轮询方式发送给下游的子分区,上下游任务并行度呈现整数倍
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.setParallelism(4)
.rescale
.map(word=>(word,1))
.setParallelism(2)
.print("测试")
.setParallelism(2)
fsEnv.execute("FlinkWordCounts")
Broadcasting
将上游数据广播给下游所有分区。
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.broadcast
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.print("测试")
fsEnv.execute("FlinkWordCounts")
Custom partitioning
自定义分区
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.partitionCustom(new Partitioner[String] {
override def partition(key: String, numPartitions: Int): Int = {
//保证是正整数 key.hashCode&Integer.MAX_VALUE
(key.hashCode&Integer.MAX_VALUE)%numPartitions
}
},t=>t._1)
.print("测试")
fsEnv.execute("Custom Partitions")