Asycn IO应用于DataStream
AsyncDataStream是一个工具类,用于将AsyncFunction应用于DataStream,AsyncFunction发出的并发请求都是无序的,该顺序基于哪个请求先完成,为了控制结果记录的发出顺序,flink提供了两种模式,分别对应AsyncDataStream的两个静态方法,OrderedWait和unorderedWait
AsyncDataStream.orderedWait();
AsyncDataStream.unorderWait();
orderedWait(有序):消息的发送顺序与接收到的顺序相同(包括 watermark ),也就是先进先出。
unorderWait(无序):在ProcessingTime和EventTime语义下无序的区别
1)在ProcessingTime中,完全无序,即哪个请求先返回结果就先发送(最低延迟和最低消耗)。
2)在EventTime中,以watermark为边界,介于两个watermark之间的消息可以乱序,但是watermark和消息之间不能乱序,这样既认为在无序中又引入了有序,这样就有了与有序一样的开销。(具体我们会在后面的原理中讲解)。
示例
生成6条数据,从0开始递增的6个数字。模拟异步查询之后,加上时间戳输出
异步方法
代码如下(示例):
继承RichAsyncFunction方法
package cn.itcast.io;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;
import java.util.ArrayList;
import java.util.Collections;
import java.util.concurrent.CompletableFuture;
import java.util.function.Supplier;
public class SampleAsyncFunction extends RichAsyncFunction<Integer, String> {
private long[] sleep = {5000L, 1000L, 5000L, 2000L, 6000L, 100L};
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
}
@Override
public void close() throws Exception {
super.close();
}
@Override
public void asyncInvoke(final Integer input, final ResultFuture<String> resultFuture) {
System.out.println(System.currentTimeMillis() + "-input:" + input + " will sleep " + sleep[input] + " ms");
asyncQuery(input, resultFuture);
}
@Override
public void timeout(Integer input, ResultFuture<String> resultFuture) throws Exception {
System.out.println("超时:"+ input);
}
//同步
private void query(final Integer input, final ResultFuture<String> resultFuture) {
try {
Thread.sleep(sleep[input]);
resultFuture.complete(Collections.singletonList(String.valueOf(input)));
} catch (InterruptedException e) {
resultFuture.complete(new ArrayList<>(0));
}
}
//异步
private void asyncQuery(final Integer input, final ResultFuture<String> resultFuture) {
CompletableFuture.supplyAsync(new Supplier<Integer>() {
@Override
public Integer get() {
try {
Thread.sleep(sleep[input]);
return input;
} catch (Exception e) {
return null;
}
}
}).thenAccept((Integer dbResult) -> {
resultFuture.complete(Collections.singleton(String.valueOf(dbResult)));
});
}
}
package cn.itcast.io;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.async.AsyncFunction;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.concurrent.TimeUnit;
public class AsyncIODemo {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
final int maxCount = 6;
final int taskNum = 1;
final long timeout = 40000;
DataStream<Integer> inputStream = env.addSource(new SimpleSource(maxCount));
AsyncFunction<Integer, String> function = new SampleAsyncFunction();
//允许请求顺序和返回顺序不一致
DataStream<String> result = AsyncDataStream.unorderedWait(
inputStream,
function,
timeout,
TimeUnit.MILLISECONDS,
10).setParallelism(taskNum);
//要求请求顺序和返回顺序不一致
DataStream<String> result2 = AsyncDataStream.orderedWait(
inputStream,
function,
timeout,
TimeUnit.MILLISECONDS,
10).setParallelism(taskNum);
result2.map(new MapFunction<String, String>() {
@Override
public String map(String value) throws Exception {
return value + "," + System.currentTimeMillis();
}
}).print("map: ");
env.execute("Async IO Demo");
}
private static class SimpleSource implements SourceFunction<Integer> {
private volatile boolean isRunning = true;
private int counter = 0;
private int start = 0;
public SimpleSource(int maxNum) {
this.counter = maxNum;
}
@Override
public void run(SourceContext<Integer> ctx) throws Exception {
while ((start < counter || counter == -1) && isRunning) {
synchronized (ctx.getCheckpointLock()) {
System.out.println("send data:" + start);
ctx.collect(start);
++start;
}
Thread.sleep(10L);
}
}
@Override
public void cancel() {
isRunning = false;
}
}
}
允许请求顺序和返回顺序不一致
//允许请求顺序和返回顺序不一致
DataStream<String> result1 = AsyncDataStream.unorderedWait(
inputStream,
function,
timeout,
TimeUnit.MILLISECONDS,
10).setParallelism(taskNum);
result1.map(new MapFunction<String, String>() {
@Override
public String map(String value) throws Exception {
return value + "," + System.currentTimeMillis();
}
}).print("map: ");
要求请求顺序和返回顺序不一致
//要求请求顺序和返回顺序不一致
DataStream<String> result2 = AsyncDataStream.orderedWait(
inputStream,
function,
timeout,
TimeUnit.MILLISECONDS,
10).setParallelism(taskNum);
result2.map(new MapFunction<String, String>() {
@Override
public String map(String value) throws Exception {
return value + "," + System.currentTimeMillis();
}
}).print("map: ");