Flink实现异步IO实战 java

Asycn IO应用于DataStream

AsyncDataStream是一个工具类,用于将AsyncFunction应用于DataStream,AsyncFunction发出的并发请求都是无序的,该顺序基于哪个请求先完成,为了控制结果记录的发出顺序,flink提供了两种模式,分别对应AsyncDataStream的两个静态方法,OrderedWait和unorderedWait

AsyncDataStream.orderedWait();

AsyncDataStream.unorderWait();

orderedWait(有序):消息的发送顺序与接收到的顺序相同(包括 watermark ),也就是先进先出。

unorderWait(无序):在ProcessingTime和EventTime语义下无序的区别

1)在ProcessingTime中,完全无序,即哪个请求先返回结果就先发送(最低延迟和最低消耗)。

2)在EventTime中,以watermark为边界,介于两个watermark之间的消息可以乱序,但是watermark和消息之间不能乱序,这样既认为在无序中又引入了有序,这样就有了与有序一样的开销。(具体我们会在后面的原理中讲解)。

示例
生成6条数据,从0开始递增的6个数字。模拟异步查询之后,加上时间戳输出

异步方法
代码如下(示例):
继承RichAsyncFunction方法

package cn.itcast.io;

import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;

import java.util.ArrayList;
import java.util.Collections;
import java.util.concurrent.CompletableFuture;
import java.util.function.Supplier;

public class SampleAsyncFunction extends RichAsyncFunction<Integer, String> {
  private long[] sleep = {5000L, 1000L, 5000L, 2000L, 6000L, 100L};

  @Override
  public void open(Configuration parameters) throws Exception {
    super.open(parameters);
  }

  @Override
  public void close() throws Exception {
    super.close();
  }

  @Override
  public void asyncInvoke(final Integer input, final ResultFuture<String> resultFuture) {
    System.out.println(System.currentTimeMillis() + "-input:" + input + " will sleep " + sleep[input] + " ms");

    asyncQuery(input, resultFuture);
  }

  @Override
  public void timeout(Integer input, ResultFuture<String> resultFuture) throws Exception {

    System.out.println("超时:"+ input);

  }

  //同步
  private void query(final Integer input, final ResultFuture<String> resultFuture) {
    try {
      Thread.sleep(sleep[input]);
      resultFuture.complete(Collections.singletonList(String.valueOf(input)));
    } catch (InterruptedException e) {
      resultFuture.complete(new ArrayList<>(0));
    }
  }
  //异步
  private void asyncQuery(final Integer input, final ResultFuture<String> resultFuture) {
    CompletableFuture.supplyAsync(new Supplier<Integer>() {

      @Override
      public Integer get() {
        try {
          Thread.sleep(sleep[input]);
          return input;
        } catch (Exception e) {
          return null;
        }
      }
    }).thenAccept((Integer dbResult) -> {
      resultFuture.complete(Collections.singleton(String.valueOf(dbResult)));
    });
  }
}

package cn.itcast.io;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.async.AsyncFunction;
import org.apache.flink.streaming.api.functions.source.SourceFunction;

import java.util.concurrent.TimeUnit;

public class AsyncIODemo {
  public static void main(String[] args) throws Exception {
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(1);
    final int maxCount = 6;
    final int taskNum = 1;
    final long timeout = 40000;

    DataStream<Integer> inputStream = env.addSource(new SimpleSource(maxCount));
    AsyncFunction<Integer, String> function = new SampleAsyncFunction();

    //允许请求顺序和返回顺序不一致
    DataStream<String> result = AsyncDataStream.unorderedWait(
      inputStream,
      function,
      timeout,
      TimeUnit.MILLISECONDS,
      10).setParallelism(taskNum);
//要求请求顺序和返回顺序不一致
    DataStream<String> result2 = AsyncDataStream.orderedWait(
      inputStream,
      function,
      timeout,
      TimeUnit.MILLISECONDS,
      10).setParallelism(taskNum);

    result2.map(new MapFunction<String, String>() {
      @Override
      public String map(String value) throws Exception {
        return value + "," + System.currentTimeMillis();
      }
    }).print("map: ");

    env.execute("Async IO Demo");
  }

  private static class SimpleSource implements SourceFunction<Integer> {
    private volatile boolean isRunning = true;
    private int counter = 0;
    private int start = 0;

    public SimpleSource(int maxNum) {
      this.counter = maxNum;
    }

    @Override
    public void run(SourceContext<Integer> ctx) throws Exception {
      while ((start < counter || counter == -1) && isRunning) {
        synchronized (ctx.getCheckpointLock()) {
          System.out.println("send data:" + start);
          ctx.collect(start);
          ++start;
        }
        Thread.sleep(10L);
      }
    }

    @Override
    public void cancel() {
      isRunning = false;
    }
  }
}

允许请求顺序和返回顺序不一致

    //允许请求顺序和返回顺序不一致
    DataStream<String> result1 = AsyncDataStream.unorderedWait(
      inputStream,
      function,
      timeout,
      TimeUnit.MILLISECONDS,
      10).setParallelism(taskNum);
          result1.map(new MapFunction<String, String>() {
      @Override
      public String map(String value) throws Exception {
        return value + "," + System.currentTimeMillis();
      }
    }).print("map: ");

在这里插入图片描述
要求请求顺序和返回顺序不一致

//要求请求顺序和返回顺序不一致
    DataStream<String> result2 = AsyncDataStream.orderedWait(
      inputStream,
      function,
      timeout,
      TimeUnit.MILLISECONDS,
      10).setParallelism(taskNum);

    result2.map(new MapFunction<String, String>() {
      @Override
      public String map(String value) throws Exception {
        return value + "," + System.currentTimeMillis();
      }
    }).print("map: ");

在这里插入图片描述

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Flink异步IO是指在流处理中,可以并发地处理多个异步请求和接收多个响应,从而提高处理吞吐量。异步IO的控制参数包括超时参数和容量参数。超时参数定义了异步请求发出多久后未得到响应即被认定为失败,防止一直等待得不到响应的请求。容量参数定义了可以同时进行的异步请求数,限制并发请求的数量,避免积压。\[1\] Flink提供了两种模式来控制异步IO的结果记录顺序。无序模式是异步请求一结束就立刻发出结果记录,流中记录的顺序在经过异步IO算子之后发生了改变。这种模式具有最低的延迟和最少的开销,适用于使用处理时间作为基本时间特征的场景。有序模式保持了流的顺序,发出结果记录的顺序与触发异步请求的顺序相同。为了实现这一点,算子将缓冲一个结果记录直到这条记录前面的所有记录都发出(或超时)。有序模式通常会带来一些额外的延迟和checkpoint开销,因为记录或结果需要在checkpoint的状态中保存更长的时间。\[3\] 总之,Flink异步IO可以通过控制参数来限制并发请求数和超时时间,从而提高流处理的吞吐量。同时,可以选择无序模式或有序模式来控制结果记录的顺序。 #### 引用[.reference_title] - *1* *2* *3* [Flink之外部数据访问的异步 I/O](https://blog.csdn.net/weixin_45366499/article/details/115265800)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值