先看例怎么实现一个异步IO的例子
public class AsyncFunctionExample extends RichAsyncFunction<String, String> {
private transient DataSource dataSource = null;
@Override
public void open(Configuration parameters) throws Exception {
dataSource = new DruidDataSource();
dataSource.setDriverClassName("com.mysql.jdbc.Driver");
dataSource.setUsername("root");
dataSource.setPassword("123456");
dataSource.setUrl("jdbc:mysql://localhost:3306/day01?characterEncoding=utf8");
}
@Override
public void asyncInvoke(String input, ResultFuture<String> resultFuture) throws Exception {
String sql = "SELECT id, name FROM orde WHERE id = ?";
String result = null;
Connection connection = null;
PreparedStatement stmt = null;
ResultSet rs = null;
try {
connection = dataSource.getConnection();
stmt = connection.prepareStatement(sql);
stmt.setString(1, param);
rs = stmt.executeQuery();
while (rs.next()) {
result = rs.getString("name");
}
} finally {
if (rs != null) {
rs.close();
}
if (stmt != null) {
stmt.close();
}
if (connection != null) {
connection.close();
}
}
resultFuture.complete(Collections.singleton(result));
}
@Override
public void timeout(String input, ResultFuture<String> resultFuture) throws Exception {
}
}
读源码
源码,使用AsyncWaitOperator来实现用户定义的AsyncFunction
@Override
public void processElement(StreamRecord<IN> element) throws Exception {
// add element first to the queue
//1
final ResultFuture<OUT> entry = addToWorkQueue(element);
final ResultHandler resultHandler = new ResultHandler(element, entry);
// register a timeout for the entry if timeout is configured
// 2
if (timeout > 0L) {
final long timeoutTimestamp = timeout + getProcessingTimeService().getCurrentProcessingTime();
final ScheduledFuture<?> timeoutTimer = getProcessingTimeService().registerTimer(
timeoutTimestamp,
timestamp -> userFunction.timeout(element.getValue(), resultHandler));
resultHandler.setTimeoutTimer(timeoutTimer);
}
//3 user function 就是用户定义的AsyncFunction
userFunction.asyncInvoke(element.getValue(), resultHandler);
}
从截取的代码看
1 将进入的 element 加入到一个queue
2 注册一个超时的计时器(可以不用太过于关注)
3 调用用户编写的逻辑AsyncFunction.asyncInvoke
一步一步往下看
- addToWorkQueue
AsyncWaitOperator#addToWorkQueue
这是一个同步的逻辑,这里有个queue成员变量,分别有OrderedStreamElementQueue和UnorderedStreamElementQueue。内部实现有所不同区别就在于出队的时候是否跟入队顺序一样,这里主要看UnorderedStreamElementQueue。
private ResultFuture<OUT> addToWorkQueue(StreamElement streamElement) throws InterruptedException {
Optional<ResultFuture<OUT>> queueEntry;
while (!(queueEntry = queue.tryPut(streamElement)).isPresent()) {
mailboxExecutor.yield();
}
return queueEntry.get();
}
UnorderedStreamElementQueue#tryPut 如果队列没满 'capacity’就添加到队列,如果满了就返回空,等待下一次的处理。
public Optional<ResultFuture<OUT>> tryPut(StreamElement streamElement) {
if (size() < capacity) {
StreamElementQueueEntry<OUT> queueEntry;
if (streamElement.isRecord()) {
queueEntry = addRecord((StreamRecord<?>) streamElement);
}
...
numberOfEntries++;
...
} else {
...
}
}
private StreamElementQueueEntry<OUT> addRecord(StreamRecord<?> record) {
// ensure that there is at least one segment
..
StreamElementQueueEntry<OUT> queueEntry = new SegmentedStreamRecordQueueEntry<>(record, lastSegment);
lastSegment.add(queueEntry);
return queueEntry;
}
最终将元素添加到 Segment.incompleteElements;
static class Segment<OUT> {
/** Unfinished input elements. */
private final Set<StreamElementQueueEntry<OUT>> incompleteElements;
/** Undrained finished elements. */
private final Queue<StreamElementQueueEntry<OUT>> completedElements;
void add(StreamElementQueueEntry<OUT> queueEntry) {
if (queueEntry.isDone()) {
completedElements.add(queueEntry);
} else {
incompleteElements.add(queueEntry);
}
}
}
上述过程操作完毕之后返回一个ResultFuture用于完成用户逻辑后的回调。
生成一个ResultHandler (implements ResultFuture) 传递给AsyncFunction#asyncInvoke
2. 第二步跳过,有兴趣可以自己研究
3. 调用AsyncFunction#asyncInvoke
也就是用户逻辑,在用户逻辑完成后调用ResultHandler#complete通知在第一步中加入队列的元素可以被发往下游了。
ResultHandler 拥有一个resultFeature成员变量,也就是第一步生成的resultFeature
private class ResultHandler implements ResultFuture<OUT> {
/**
* The handle received from the queue to update the entry. Should only be used to inject the result;
* exceptions are handled here.
*/
private final ResultFuture<OUT> resultFuture;
public void complete(Collection<OUT> results) {
Preconditions.checkNotNull(results, "Results must not be null, use empty collection to emit nothing");
// already completed (exceptionally or with previous complete call from ill-written AsyncFunction), so
// ignore additional result
if (!completed.compareAndSet(false, true)) {
return;
}
processInMailbox(results);
}
}
processInMailBox(results)
之前的逻辑不细说,检查了一下这个ResultFuture是不是已经被处理过了,如果处理不会被重复处理。主要看processInMailbox(results)
private void processInMailbox(Collection<OUT> results) {
// move further processing into the mailbox thread
mailboxExecutor.execute(
() -> processResults(results),
"Result in AsyncWaitOperator of input %s", results);
}
private void processResults(Collection<OUT> results) {
// Cancel the timer once we've completed the stream record buffer entry. This will remove the registered
// timer task
if (timeoutTimer != null) {
// canceling in mailbox thread avoids https://issues.apache.org/jira/browse/FLINK-13635
timeoutTimer.cancel(true);
}
// update the queue entry with the result
resultFuture.complete(results);
// now output all elements from the queue that have been completed (in the correct order)
outputCompletedElement();
}
关键两行代码
resultFuture.complete(results)
将用户处理好的值设置到StreamRecord。
outputCompletedElement()
向下游发送返回的数据。
private void outputCompletedElement() {
if (queue.hasCompletedElements()) {
// emit only one element to not block the mailbox thread unnecessarily
queue.emitCompletedElement(timestampedCollector);
// if there are more completed elements, emit them with subsequent mails
if (queue.hasCompletedElements()) {
mailboxExecutor.execute(this::outputCompletedElement, "AsyncWaitOperator#outputCompletedElement");
}
}
}
UnorderStreamElementQueue#emitCompletedElement
UnorderStreamElementQueue#emitCompleted
public void emitCompletedElement(TimestampedCollector<OUT> output) {
if (segments.isEmpty()) {
return;
}
final Segment currentSegment = segments.getFirst();
numberOfEntries -= currentSegment.emitCompleted(output);
// remove any segment if there are further segments, if not leave it as an optimization even if empty
if (segments.size() > 1 && currentSegment.isEmpty()) {
segments.pop();
}
}
# class Segment
void completed(StreamElementQueueEntry<OUT> elementQueueEntry) {
// adding only to completed queue if not completed before
// there may be a real result coming after a timeout result, which is updated in the queue entry but
// the entry is not re-added to the complete queue
if (incompleteElements.remove(elementQueueEntry)) {
completedElements.add(elementQueueEntry);
}
}
int emitCompleted(TimestampedCollector<OUT> output) {
final StreamElementQueueEntry<OUT> completedEntry = completedElements.poll();
if (completedEntry == null) {
return 0;
}
completedEntry.emitResult(output);
return 1;
}
移除未完成队列的元素Segment.incompleteElements,添加元素到已完成队列,移除未完成队列的元素。
从Segment.completedElements队列中取出完成的元素,output发送出去
总结
可以看到此处的异步IO做到什么功能
- 将元素放入队列
- 执行用户编写的逻辑
- 用户逻辑执行完毕调用resultFeature.complete方法
- 发射结果到下游
这里异步的的地方是 在用户逻辑中可以注册回调方法,待结果返回后调用complete方法。再向下游发送数据。
灵魂拷问
那么问题来了,开头的示例代码问题在哪里?它虽然用了异步IO但它异步了吗?在什么场景或者是哪些引擎能支持Flink的异步IO