Flink DataStream API之partition

Random partitioning:随机分区

dataStream.shuffle()

Rebalancing:对数据集进行再平衡,重分区,消除数据倾斜

dataStream.rebalance()

从源码中关键代码,可看出是partition的数据重新分配,以达到完全的均衡

import org.apache.flink.annotation.Internal;
import org.apache.flink.runtime.plugable.SerializationDelegate;
import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;

import java.util.concurrent.ThreadLocalRandom;

/**
 * Partitioner that distributes the data equally by cycling through the output
 * channels.
 *
 * @param <T> Type of the elements in the Stream being rebalanced
 */
@Internal
public class RebalancePartitioner<T> extends StreamPartitioner<T> {
	private static final long serialVersionUID = 1L;

	private final int[] returnArray = {Integer.MAX_VALUE - 1};

	@Override
	public int[] selectChannels(
			SerializationDelegate<StreamRecord<T>> record,
			int numChannels) {
		int newChannel = ++returnArray[0];
		if (newChannel >= numChannels) {
			returnArray[0] = resetValue(numChannels, newChannel);
		}
		return returnArray;
	}

	private static int resetValue(
			int numChannels,
			int newChannel) {
		if (newChannel == Integer.MAX_VALUE) {
			// Initializes the first partition, this branch is only entered when initializing.
			return ThreadLocalRandom.current().nextInt(numChannels);
		}
		return 0;
	}

	public StreamPartitioner<T> copy() {
		return this;
	}

	@Override
	public String toString() {
		return "REBALANCE";
	}
}

Rescaling:如果上游操作有2个并发,而下游操作有4个并发,那么上游的一个并发结果分配给下游的两个并发操作,另外的一个并发结果分配给了下游的另外两个并发操作.另一方面,下游有两个并发操作而上游又4个并发操作,那么上游的其中两个操作的结果分配给下游的一个并发操作而另外两个并发操作的结果则分配给另外一个并发操作。

Rescaling与Rebalancing的区别:Rebalancing会产生全量重分区,而Rescaling不会。

Custom partitioning:自定义分区(自定义分区需要实现Partitioner接口)

dataStream.partitionCustom(partitioner, "someKey") 或者 dataStream.partitionCustom(partitioner, 0);

自定义分区demo:

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.api.common.functions.Partitioner;
import org.apache.flink.streaming.api.functions.source.SourceFunction;

public class TestCustomPartition {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);
        DataStream<Long> text = env.addSource(new NoParalleSource());
        SingleOutputStreamOperator<Tuple1<Long>> map = text.map(new MapFunction<Long, Tuple1<Long>>() {
            @Override
            public Tuple1<Long> map(Long value) throws Exception {
                return new Tuple1<>(value);
            }
        });
        DataStream<Tuple1<Long>> stream = map.partitionCustom(new MyPartition(), 0);
        SingleOutputStreamOperator<Long> result = stream.map(new MapFunction<Tuple1<Long>, Long>() {
            @Override
            public Long map(Tuple1<Long> value) throws Exception {
                System.out.println("当前线程id:" + Thread.currentThread().getId() + ",value: " + value);
                return value.getField(0);
            }
        });
        result.print();
        env.execute("TestCustomPartition");
    }
public static class MyPartition implements Partitioner<Long> {
    @Override
    public int partition(Long key, int i) {
        System.out.println("分区总数:"+ i);
        if(key % 2 == 0){
            return 0;
        }else{
            return 1;
        }
    }
}


public static class NoParalleSource implements SourceFunction<Long> {
    private long count =1;
    private boolean isRun = true;
    @Override
    public void run(SourceContext<Long> ctx) throws Exception {
        while (isRun) {
            ctx.collect(count++);
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        isRun = false;
    }
}
}
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值