FlinkCDC读取Oracle原理

最新推荐文章于 2024-05-09 20:38:03 发布

曾经的梅

最新推荐文章于 2024-05-09 20:38:03 发布

阅读量2.1k

点赞数

文章标签： oracle 数据库 flink 大数据

本文链接：https://blog.csdn.net/qq_42488390/article/details/131444924

版权

一：今天分享一篇FLinkCDC读取OracleCDC原理

首先映入眼帘的是flinkCDC读取Oracle的主要代码

public class OracleExample {
    public static void main(String[] args) throws Exception {
            SourceFunction<String> sourceFunction = OracleSource.<String>builder()
                    //.url("jdbc:oracle:thin:@{hostname}:{port}:{database}")
                    .hostname("162.14.97.42")
                    .port(1521)
                    .database("helowin") // monitor XE database
                    .schemaList("HR") // monitor inventory schema
                    .tableList("HR.EMPLOYEES") // monitor products table
                    .username("system")
                    .password("system")
                    .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
                    .build();

            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

            env
                    .addSource(sourceFunction)
                    .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering

            env.execute();
    }
}

可以看到最主要的一个类是

OracleSource，进入到这个类里面去，发现都是一些配置属性以及一个DebeziumSourceFunction的配置属性，之后找到最后一个方法，进入到DebeziumSourceFunction这个类里面去，这个类主要继承了flink的一个richsourcefunction，和实现了sourcefunction这个接口，这个类里面主要方法是run方法，方法的代码如下

 public void run(SourceContext<T> sourceContext) throws Exception {
        properties.setProperty("name", "engine");
        properties.setProperty("offset.storage", FlinkOffsetBackingStore.class.getCanonicalName());
        if (restoredOffsetState != null) {
            // restored from state
            properties.setProperty(FlinkOffsetBackingStore.OFFSET_STATE_VALUE, restoredOffsetState);
        }
        // DO NOT include schema change, e.g. DDL
        properties.setProperty("include.schema.changes", "false");
        // disable the offset flush totally
        properties.setProperty("offset.flush.interval.ms", String.valueOf(Long.MAX_VALUE));
        // disable tombstones
        properties.setProperty("tombstones.on.delete", "false");
        if (engineInstanceName == null) {
            // not restore from recovery
            engineInstanceName = UUID.randomUUID().toString();
        }
        // history instance name to initialize FlinkDatabaseHistory
        properties.setProperty(
                FlinkDatabaseHistory.DATABASE_HISTORY_INSTANCE_NAME, engineInstanceName);
        // we have to use a persisted DatabaseHistory implementation, otherwise, recovery can't
        // continue to read binlog
        // see
        // https://stackoverflow.com/questions/57147584/debezium-error-schema-isnt-know-to-this-connector
        // and https://debezium.io/blog/2018/03/16/note-on-database-history-topic-configuration/
        properties.setProperty("database.history", determineDatabase().getCanonicalName());

        // we have to filter out the heartbeat events, otherwise the deserializer will fail
        String dbzHeartbeatPrefix =
                properties.getProperty(
                        Heartbeat.HEARTBEAT_TOPICS_PREFIX.name(),
                        Heartbeat.HEARTBEAT_TOPICS_PREFIX.defaultValueAsString());
						
		//扮演消费者的角色
        this.debeziumChangeFetcher =
                new DebeziumChangeFetcher<>(
                        sourceContext,
                        deserializer,
                        restoredOffsetState == null, // DB snapshot phase if restore state is null
                        dbzHeartbeatPrefix,
                        handover);

        // 扮演生产者的角色
        this.engine =
                DebeziumEngine.create(Connect.class)
                        .using(properties)
                        .notifying(changeConsumer)
                        .using(OffsetCommitPolicy.always())
                        .using(
                                (success, message, error) -> {
                                    if (success) {
                                        // Close the handover and prepare to exit.
                                        handover.close();
                                    } else {
                                        handover.reportError(error);
                                    }
                                })
                        .build();

        // run the engine asynchronously
        executor.execute(engine);
        debeziumStarted = true;

        // initialize metrics
        // make RuntimeContext#getMetricGroup compatible between Flink 1.13 and Flink 1.14
        final Method getMetricGroupMethod =
                getRuntimeContext().getClass().getMethod("getMetricGroup");
        getMetricGroupMethod.setAccessible(true);
        final MetricGroup metricGroup =
                (MetricGroup) getMetricGroupMethod.invoke(getRuntimeContext());

        metricGroup.gauge(
                "currentFetchEventTimeLag",
                (Gauge<Long>) () -> debeziumChangeFetcher.getFetchDelay());
        metricGroup.gauge(
                "currentEmitEventTimeLag",
                (Gauge<Long>) () -> debeziumChangeFetcher.getEmitDelay());
        metricGroup.gauge(
                "sourceIdleTime", (Gauge<Long>) () -> debeziumChangeFetcher.getIdleTime());

        // start the real debezium consumer
        debeziumChangeFetcher.runFetchLoop();
    }

其中在这个方法里面两个主要的属性engine，debeziumChangeFetcher，这两个采用了生产者和消费者的概念来读取Oraclecdc的数据，其中engine，这个属于是生产者的角色，debeziumChangeFetcher是消费者的角色，生产者将数据读取出来放到一个Handover，这个类里面，之后由消费者角色来进行读取，

进入到Handover这个类里面，这里面由两个重要的方法，如下所示，

生产数据，并且将数据发送给消费者（在Java线程里面就是将消费者线程唤醒）

    public void produce(final List<ChangeEvent<SourceRecord, SourceRecord>> element)
            throws InterruptedException {

        checkNotNull(element);

        synchronized (lock) {
            while (next != null && !wakeupProducer) {
                lock.wait();
            }

            wakeupProducer = false;

            // an error marks this as closed for the producer
            if (error != null) {
                ExceptionUtils.rethrow(error, error.getMessage());
            } else {
                // if there is no error, then this is open and can accept this element
                next = element;
                lock.notifyAll();
            }
        }
    }

消费数据，并且消费完成之后唤醒生产者继续生产数据

    public List<ChangeEvent<SourceRecord, SourceRecord>> pollNext() throws Exception {
        synchronized (lock) {
            while (next == null && error == null) {
                lock.wait();
            }
            List<ChangeEvent<SourceRecord, SourceRecord>> n = next;
            if (n != null) {
                next = null;
                lock.notifyAll();
                return n;
            } else {
                ExceptionUtils.rethrowException(error, error.getMessage());

                // this statement cannot be reached since the above method always throws an
                // exception this is only here to silence the compiler and any warnings
                return Collections.emptyList();
            }
        }
    }

其中produce方法生产数据由DebeziumChangeConsumer消费者的handleBatch来调度生产数据

    public void handleBatch(
            List<ChangeEvent<SourceRecord, SourceRecord>> events,
            RecordCommitter<ChangeEvent<SourceRecord, SourceRecord>> recordCommitter) {
        try {
            currentCommitter = recordCommitter;
            handover.produce(events);
        } catch (Throwable e) {
            // Hold this exception in handover and trigger the fetcher to exit
            handover.reportError(e);
        }
    }

pollNext方法由DebeziumChangeFetcher这个类的runFetchLoop来消费数据并且将数据发送给flink，交由flink来处理数据

public void runFetchLoop() throws Exception {
        try {
            // begin snapshot database phase
            if (isInDbSnapshotPhase) {
                List<ChangeEvent<SourceRecord, SourceRecord>> events = handover.pollNext();

                synchronized (checkpointLock) {
                    LOG.info(
                            "Database snapshot phase can't perform checkpoint, acquired Checkpoint lock.");
                    handleBatch(events);
                    while (isRunning && isInDbSnapshotPhase) {
                        handleBatch(handover.pollNext());
                    }
                }
                LOG.info("Received record from streaming binlog phase, released checkpoint lock.");
            }

            // begin streaming binlog phase
            while (isRunning) {
                // If the handover is closed or has errors, exit.
                // If there is no streaming phase, the handover will be closed by the engine.
                handleBatch(handover.pollNext());
            }
        } catch (Handover.ClosedException e) {
            // ignore
        }
    }

总结：该架构主要是采用了kafka的一个生产者和消费者的思想来构成了整个flinkcdc消费Oraclecdc的数据，通过开启两个线程，充当不同的角色，每个线程只需要完成自己的线程当中任务。并且在这里，我们使用Handover作为缓冲区，将数据从生产者提交给消费者。因为这两个线程不直接相互通信，所以错误报告也依赖于切换。当发动机出现错误时，发动机会使用DebeziumEngine.CompletionCallback向交车报告错误，并唤醒消费者以检查错误。

曾经的梅

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
FlinkCDC读取Oracle原理

其中在这个方法里面两个主要的属性engine，debeziumChangeFetcher，这两个采用了生产者和消费者的概念来读取Oraclecdc的数据，其中engine，这个属于是生产者的角色，debeziumChangeFetcher是消费者的角色，生产者将数据读取出来放到一个Handover，这个类里面，之后由消费者角色来进行读取，生产数据，并且将数据发送给消费者（在Java线程里面就是将消费者线程唤醒）进入到Handover这个类里面，这里面由两个重要的方法，如下所示，可以看到最主要的一个类是。
复制链接

扫一扫

FlinkCDC读取Oracle原理

一：今天分享一篇FLinkCDC读取OracleCDC原理

“相关推荐”对你有帮助么？