FlinkCDC读取Oracle原理

一:今天分享一篇FLinkCDC读取OracleCDC原理

首先映入眼帘的是flinkCDC读取Oracle的主要代码

public class OracleExample {
    public static void main(String[] args) throws Exception {
            SourceFunction<String> sourceFunction = OracleSource.<String>builder()
                    //.url("jdbc:oracle:thin:@{hostname}:{port}:{database}")
                    .hostname("162.14.97.42")
                    .port(1521)
                    .database("helowin") // monitor XE database
                    .schemaList("HR") // monitor inventory schema
                    .tableList("HR.EMPLOYEES") // monitor products table
                    .username("system")
                    .password("system")
                    .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
                    .build();

            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

            env
                    .addSource(sourceFunction)
                    .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering

            env.execute();
    }
}

可以看到最主要的一个类是

OracleSource,进入到这个类里面去,发现都是一些配置属性以及一个DebeziumSourceFunction的配置属性,之后找到最后一个方法,进入到DebeziumSourceFunction这个类里面去,这个类主要继承了flink的一个richsourcefunction,和实现了sourcefunction这个接口,这个类里面主要方法是run方法,方法的代码如下
 public void run(SourceContext<T> sourceContext) throws Exception {
        properties.setProperty("name", "engine");
        properties.setProperty("offset.storage", FlinkOffsetBackingStore.class.getCanonicalName());
        if (restoredOffsetState != null) {
            // restored from state
            properties.setProperty(FlinkOffsetBackingStore.OFFSET_STATE_VALUE, restoredOffsetState);
        }
        // DO NOT include schema change, e.g. DDL
        properties.setProperty("include.schema.changes", "false");
        // disable the offset flush totally
        properties.setProperty("offset.flush.interval.ms", String.valueOf(Long.MAX_VALUE));
        // disable tombstones
        properties.setProperty("tombstones.on.delete", "false");
        if (engineInstanceName == null) {
            // not restore from recovery
            engineInstanceName = UUID.randomUUID().toString();
        }
        // history instance name to initialize FlinkDatabaseHistory
        properties.setProperty(
                FlinkDatabaseHistory.DATABASE_HISTORY_INSTANCE_NAME, engineInstanceName);
        // we have to use a persisted DatabaseHistory implementation, otherwise, recovery can't
        // continue to read binlog
        // see
        // https://stackoverflow.com/questions/57147584/debezium-error-schema-isnt-know-to-this-connector
        // and https://debezium.io/blog/2018/03/16/note-on-database-history-topic-configuration/
        properties.setProperty("database.history", determineDatabase().getCanonicalName());

        // we have to filter out the heartbeat events, otherwise the deserializer will fail
        String dbzHeartbeatPrefix =
                properties.getProperty(
                        Heartbeat.HEARTBEAT_TOPICS_PREFIX.name(),
                        Heartbeat.HEARTBEAT_TOPICS_PREFIX.defaultValueAsString());
						
		//扮演消费者的角色
        this.debeziumChangeFetcher =
                new DebeziumChangeFetcher<>(
                        sourceContext,
                        deserializer,
                        restoredOffsetState == null, // DB snapshot phase if restore state is null
                        dbzHeartbeatPrefix,
                        handover);

        // 扮演生产者的角色
        this.engine =
                DebeziumEngine.create(Connect.class)
                        .using(properties)
                        .notifying(changeConsumer)
                        .using(OffsetCommitPolicy.always())
                        .using(
                                (success, message, error) -> {
                                    if (success) {
                                        // Close the handover and prepare to exit.
                                        handover.close();
                                    } else {
                                        handover.reportError(error);
                                    }
                                })
                        .build();

        // run the engine asynchronously
        executor.execute(engine);
        debeziumStarted = true;

        // initialize metrics
        // make RuntimeContext#getMetricGroup compatible between Flink 1.13 and Flink 1.14
        final Method getMetricGroupMethod =
                getRuntimeContext().getClass().getMethod("getMetricGroup");
        getMetricGroupMethod.setAccessible(true);
        final MetricGroup metricGroup =
                (MetricGroup) getMetricGroupMethod.invoke(getRuntimeContext());

        metricGroup.gauge(
                "currentFetchEventTimeLag",
                (Gauge<Long>) () -> debeziumChangeFetcher.getFetchDelay());
        metricGroup.gauge(
                "currentEmitEventTimeLag",
                (Gauge<Long>) () -> debeziumChangeFetcher.getEmitDelay());
        metricGroup.gauge(
                "sourceIdleTime", (Gauge<Long>) () -> debeziumChangeFetcher.getIdleTime());

        // start the real debezium consumer
        debeziumChangeFetcher.runFetchLoop();
    }

其中在这个方法里面两个主要的属性engine,debeziumChangeFetcher,这两个采用了生产者和消费者的概念来读取Oraclecdc的数据,其中engine,这个属于是生产者的角色,debeziumChangeFetcher是消费者的角色,生产者将数据读取出来放到一个Handover,这个类里面,之后由消费者角色来进行读取,

进入到Handover这个类里面,这里面由两个重要的方法,如下所示,

生产数据,并且将数据发送给消费者(在Java线程里面就是将消费者线程唤醒)

    public void produce(final List<ChangeEvent<SourceRecord, SourceRecord>> element)
            throws InterruptedException {

        checkNotNull(element);

        synchronized (lock) {
            while (next != null && !wakeupProducer) {
                lock.wait();
            }

            wakeupProducer = false;

            // an error marks this as closed for the producer
            if (error != null) {
                ExceptionUtils.rethrow(error, error.getMessage());
            } else {
                // if there is no error, then this is open and can accept this element
                next = element;
                lock.notifyAll();
            }
        }
    }
消费数据,并且消费完成之后唤醒生产者继续生产数据
    public List<ChangeEvent<SourceRecord, SourceRecord>> pollNext() throws Exception {
        synchronized (lock) {
            while (next == null && error == null) {
                lock.wait();
            }
            List<ChangeEvent<SourceRecord, SourceRecord>> n = next;
            if (n != null) {
                next = null;
                lock.notifyAll();
                return n;
            } else {
                ExceptionUtils.rethrowException(error, error.getMessage());

                // this statement cannot be reached since the above method always throws an
                // exception this is only here to silence the compiler and any warnings
                return Collections.emptyList();
            }
        }
    }

其中produce方法生产数据由DebeziumChangeConsumer消费者的handleBatch来调度生产数据

    public void handleBatch(
            List<ChangeEvent<SourceRecord, SourceRecord>> events,
            RecordCommitter<ChangeEvent<SourceRecord, SourceRecord>> recordCommitter) {
        try {
            currentCommitter = recordCommitter;
            handover.produce(events);
        } catch (Throwable e) {
            // Hold this exception in handover and trigger the fetcher to exit
            handover.reportError(e);
        }
    }

pollNext方法由DebeziumChangeFetcher这个类的runFetchLoop来消费数据并且将数据发送给flink,交由flink来处理数据

public void runFetchLoop() throws Exception {
        try {
            // begin snapshot database phase
            if (isInDbSnapshotPhase) {
                List<ChangeEvent<SourceRecord, SourceRecord>> events = handover.pollNext();

                synchronized (checkpointLock) {
                    LOG.info(
                            "Database snapshot phase can't perform checkpoint, acquired Checkpoint lock.");
                    handleBatch(events);
                    while (isRunning && isInDbSnapshotPhase) {
                        handleBatch(handover.pollNext());
                    }
                }
                LOG.info("Received record from streaming binlog phase, released checkpoint lock.");
            }

            // begin streaming binlog phase
            while (isRunning) {
                // If the handover is closed or has errors, exit.
                // If there is no streaming phase, the handover will be closed by the engine.
                handleBatch(handover.pollNext());
            }
        } catch (Handover.ClosedException e) {
            // ignore
        }
    }

总结:该架构主要是采用了kafka的一个生产者和消费者的思想来构成了整个flinkcdc消费Oraclecdc的数据,通过开启两个线程,充当不同的角色,每个线程只需要完成自己的线程当中任务。并且在这里,我们使用Handover作为缓冲区,将数据从生产者提交给消费者。因为这两个线程不直接相互通信,所以错误报告也依赖于切换。当发动机出现错误时,发动机会使用DebeziumEngine.CompletionCallback向交车报告错误,并唤醒消费者以检查错误。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值