Flink源码-task执行

执行
flink 作业的最小执行单元是task

示例
public class WorldCount {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        DataStream<Tuple2<String, Integer>> dataStream = env
                .socketTextStream("localhost", Integer.parseInt(args[0]))
                .flatMap(new Splitter())
                .keyBy(0)
                .timeWindow(Time.seconds(5))
                .sum(1);

        dataStream.print();


        env.getExecutionPlan();

        env.execute("Window WordCount");
    }

    public static class Splitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String sentence, Collector<Tuple2<String, Integer>> out) throws Exception {
            for (String word : sentence.split(" ")) {
                out.collect(new Tuple2<String, Integer>(word, 1));
            }
        }
    }
}


入口类
org.apache.flink.runtime.taskexecutor.TaskExecutor
核心方法

public CompletableFuture<Acknowledge> submitTask(){

            // now load and instantiate the task's invokable code
            invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass, env);
            // run the invokable
            invokable.invoke();
            
    }

source部分
这里的invoable就是
org.apache.flink.streaming.runtime.tasks.SourceStreamTask 这个类,调用invoke方法

任务基类
org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable 所有的task都实现了这个类
所有的任务执行都是invoke

重点关注SourceStreamTask的初始化
主要通过反射拿到构造器,再调用instance方法初始化这个对象
Environment这个数据代表了这个任务的执行信息

private static AbstractInvokable loadAndInstantiateInvokable(
        ClassLoader classLoader,
        String className,
        Environment environment) throws Throwable {

        final Class<? extends AbstractInvokable> invokableClass;
        try {
            invokableClass = Class.forName(className, true, classLoader)
                .asSubclass(AbstractInvokable.class);
        } catch (Throwable t) {
            throw new Exception("Could not load the task's invokable class.", t);
        }

        Constructor<? extends AbstractInvokable> statelessCtor;

        try {
            statelessCtor = invokableClass.getConstructor(Environment.class);
        } catch (NoSuchMethodException ee) {
            throw new FlinkException("Task misses proper constructor", ee);
        }

        // instantiate the class
        try {
            //noinspection ConstantConditions  --> cannot happen
            //----------------核心方法------------------
            return statelessCtor.newInstance(environment);
        } catch (InvocationTargetException e) {
            // directly forward exceptions from the eager initialization
            throw e.getTargetException();
        } catch (Exception e) {
            throw new FlinkException("Could not instantiate the task's invokable class.", e);
        }
    }

回到invoke方法

org.apache.flink.streaming.runtime.tasks.StreamTask 这个类给出了
invoke的实现

@Override
    public final void invoke() throws Exception {

        boolean disposed = false;
        try {
            

            operatorChain = new OperatorChain<>(this, recordWriters);
            headOperator = operatorChain.getHeadOperator();

            // task specific initialization
            init();

            // save the work of reloading state, etc, if the task is already canceled
            if (canceled) {
                throw new CancelTaskException();
            }

            // -------- Invoke --------
            LOG.debug("Invoking {}", getName());

            // we need to make sure that any triggers scheduled in open() cannot be
            // executed before all operators are opened
            synchronized (lock) {

                // both the following operations are protected by the lock
                // so that we avoid race conditions in the case that initializeState()
                // registers a timer, that fires before the open() is called.

                initializeState();
                openAllOperators();
            }

            // final check to exit early before starting to run
            if (canceled) {
                throw new CancelTaskException();
            }

            //开始执行开个算子
            isRunning = true;
            //------------------------这是重要方法--------------------------
            run();

            // if this left the run() method cleanly despite the fact that this was canceled,
            // make sure the "clean shutdown" is not attempted
            if (canceled) {
                throw new CancelTaskException();
            }

            LOG.debug("Finished task {}", getName());

            // make sure no further checkpoint and notification actions happen.
            // we make sure that no other thread is currently in the locked scope before
            // we close the operators by trying to acquire the checkpoint scope lock
            // we also need to make sure that no triggers fire concurrently with the close logic
            // at the same time, this makes sure that during any "regular" exit where still
            synchronized (lock) {
                // this is part of the main logic, so if this fails, the task is considered failed
                closeAllOperators();

                // make sure no new timers can come
                timerService.quiesce();

                // only set the StreamTask to not running after all operators have been closed!
                // See FLINK-7430
                isRunning = false;
            }

            // make sure all timers finish
            timerService.awaitPendingAfterQuiesce();

            LOG.debug("Closed operators for task {}", getName());

            // make sure all buffered data is flushed
            operatorChain.flushOutputs();

            // make an attempt to dispose the operators such that failures in the dispose call
            // still let the computation fail
            tryDisposeAllOperators();
            disposed = true;
        }
    }

performDefaultAction 方法在org.apache.flink.streaming.runtime.tasks.SourceStreamTask 重写了

protected void performDefaultAction(ActionContext context) throws Exception {
        
        sourceThread.start();


    }

启动了sourceThread线程,再往下走

private class LegacySourceFunctionThread extends Thread {

        private Throwable sourceExecutionThrowable;

        LegacySourceFunctionThread() {
            this.sourceExecutionThrowable = null;
        }

        //headOperator 代表第一步算子
        @Override
        public void run() {
            try {
                headOperator.run(getCheckpointLock(), getStreamStatusMaintainer(), operatorChain);
            } catch (Throwable t) {
                sourceExecutionThrowable = t;
            } finally {
                mailbox.clearAndPut(SOURCE_POISON_LETTER);
            }
        }

        void checkThrowSourceExecutionException() throws Exception {
            if (sourceExecutionThrowable != null) {
                throw new Exception(sourceExecutionThrowable);
            }
        }
    }

最后走到了
org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction
这就是业务代码了

public void run(SourceContext<String> ctx) throws Exception {
        final StringBuilder buffer = new StringBuilder();
        long attempt = 0;

        while (isRunning) {

            try (Socket socket = new Socket()) {
                currentSocket = socket;

                LOG.info("Connecting to server socket " + hostname + ':' + port);
                socket.connect(new InetSocketAddress(hostname, port), CONNECTION_TIMEOUT_TIME);
                try (BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream()))) {

                    char[] cbuf = new char[8192];
                    int bytesRead;
                    while (isRunning && (bytesRead = reader.read(cbuf)) != -1) {
                        buffer.append(cbuf, 0, bytesRead);
                        int delimPos;
                        while (buffer.length() >= delimiter.length() && (delimPos = buffer.indexOf(delimiter)) != -1) {
                            String record = buffer.substring(0, delimPos);
                            // truncate trailing carriage return
                            if (delimiter.equals("\n") && record.endsWith("\r")) {
                                record = record.substring(0, record.length() - 1);
                            }
                            ctx.collect(record);
                            buffer.delete(0, delimPos + delimiter.length());
                        }
                    }
                }
            }

            // if we dropped out of this loop due to an EOF, sleep and retry
            if (isRunning) {
                attempt++;
                if (maxNumRetries == -1 || attempt < maxNumRetries) {
                    LOG.warn("Lost connection to server socket. Retrying in " + delayBetweenRetries + " msecs...");
                    Thread.sleep(delayBetweenRetries);
                }
                else {
                    // this should probably be here, but some examples expect simple exists of the stream source
                    // throw new EOFException("Reached end of stream and reconnects are not enabled.");
                    break;
                }
            }
        }

        // collect trailing data
        if (buffer.length() > 0) {
            ctx.collect(buffer.toString());
        }
    }

窗口函数
上面主要分析了窗口函数这部分
这部分的task类
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
直接从run分析
performDefaultAction 这个方法和sourcetask走的不同

/**
     * Runs the stream-tasks main processing loop.
     */
    private void run() throws Exception {
        final ActionContext actionContext = new ActionContext();
        while (true) {
            if (mailbox.hasMail()) {
                Optional<Runnable> maybeLetter;
                while ((maybeLetter = mailbox.tryTakeMail()).isPresent()) {
                    Runnable letter = maybeLetter.get();
                    if (letter == POISON_LETTER) {
                        return;
                    }
                    letter.run();
                }
            }

            performDefaultAction(actionContext);
        }
    }

走到这里

protected void performDefaultAction(ActionContext context) throws Exception {
        if (!inputProcessor.processInput()) {
            context.allActionsCompleted();
        }
    }
1
2
3
4
5
继续走到这里
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor

@Override
    public boolean processInput() throws Exception {
        initializeNumRecordsIn();

        StreamElement recordOrMark = input.pollNextNullable();
        if (recordOrMark == null) {
            input.isAvailable().get();
            return !checkFinished();
        }
        int channel = input.getLastChannel();
        checkState(channel != StreamTaskInput.UNSPECIFIED);

        processElement(recordOrMark, channel);
        return true;
    }

然后这里,感觉要到头了

private void processElement(StreamElement recordOrMark, int channel) throws Exception {
        if (recordOrMark.isRecord()) {
            // now we can do the actual processing
            StreamRecord<IN> record = recordOrMark.asRecord();
            synchronized (lock) {
                numRecordsIn.inc();
                streamOperator.setKeyContextElement1(record);
                streamOperator.processElement(record);
            }
        }
        else if (recordOrMark.isWatermark()) {
            // handle watermark
            statusWatermarkValve.inputWatermark(recordOrMark.asWatermark(), channel);
        } else if (recordOrMark.isStreamStatus()) {
            // handle stream status
            statusWatermarkValve.inputStreamStatus(recordOrMark.asStreamStatus(), channel);
        } else if (recordOrMark.isLatencyMarker()) {
            // handle latency marker
            synchronized (lock) {
                streamOperator.processLatencyMarker(recordOrMark.asLatencyMarker());
            }
        } else {
            throw new UnsupportedOperationException("Unknown type of StreamElement");
        }
    }

接下为是这个类
org.apache.flink.streaming.api.operators.StreamFlatMap
最后调用flink-core里的userFunction.flatmap 方法

@Override
    public void processElement(StreamRecord<IN> element) throws Exception {
        collector.setTimestamp(element);
        userFunction.flatMap(element.getValue(), collector);
    }
 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值