Flink CDC读取Mongodb数据

5 篇文章 0 订阅
4 篇文章 0 订阅


Flink CDC读取Mysql数据可参考https://blog.csdn.net/penngo/article/details/124916196

1、简介

MongoDB CDC连接器通过伪装一个MongoDB集群里副本,利用MongoDB集群的高可用机制,该副本可以从master节点获取完整oplog(operation log)事件流。

Flink CDC官网https://github.com/ververica/flink-cdc-connectors
MongoDB CDChttps://github.com/ververica/flink-cdc-connectors/blob/master/docs/content/connectors/mongodb-cdc.md
mongodb知识点整理https://blog.csdn.net/penngo/article/details/124232016

项目源码在在文章末尾

2、依赖条件

  • MongoDB版本
    MongoDB version >= 3.6

  • 集群部署
    副本集分片集群

  • Storage Engine
    WiredTiger存储引擎。

  • 副本集协议版本
    副本集协议版本1 (pv1)
    从4.0版本开始,MongoDB只支持pv1。 pv1是MongoDB 3.2或更高版本创建的所有新副本集的默认值。

  • 需要的权限
    MongoDB Kafka Connector需要changeStreamread 权限。
    您可以使用下面的示例进行简单授权:
    更多详细授权请参考MongoDB数据库用户角色

    use admin;
    db.createUser({
      user: "flinkuser",
      pwd: "flinkpw",
      roles: [
        { role: "read", db: "admin" }, //read role includes changeStream privilege 
        { role: "readAnyDatabase", db: "admin" } //for snapshot reading
      ]
    });
    

3、配置MongoDB副本集

创建mongo1.conf、mongo2.conf、mongo3.conf

# mongo1.conf
dbpath=/data/mongodb-4.4.13/data1
logpath=/data/mongodb-4.4.13/mongo1.log
logappend=true
port=27017
replSet=replicaSet_penngo  # 副本集名称
oplogSize=200
# mongo2.conf
dbpath=/data/mongodb-4.4.13/data2
logpath=/data/mongodb-4.4.13/mongo2.log
logappend=true
port=27018
replSet=replicaSet_penngo  # 副本集名称
oplogSize=200
# mongo3.conf
dbpath=/data/mongodb-4.4.13/data3
logpath=/data/mongodb-4.4.13/mongo3.log
logappend=true
port=27019
replSet=replicaSet_penngo  # 副本集名称
oplogSize=200

启动mongodb服务端
在单独的终端上分别运行以下命令:

> mongod --config ../mongo1.conf
> mongod --config ../mongo2.conf
> mongod --config ../mongo3.conf

连接mongodb,使用mongo shell配置副本集

> mongo --port 27017

# 在mongo shell中执行下边命令初始化副本集

> rsconf = {
 _id: "replicaSet_penngo",
 members: [
 {_id: 0, host: "localhost:27017"},
 {_id: 1, host: "localhost:27018"},
 {_id: 2, host: "localhost:27019"}
 ]
 }
> rs.initiate(rsconf)

mongo shell中创建数据库penngo_db和集合coll,插入1000条数据

> use penngo_db
> for (i=0; i<1000; i++) {db.coll.insert({user: "penngo" + i})}
> db.coll.count()

在这里插入图片描述

在mongo shell创建新用户,给Flink MongoDB CDC使用

> use admin;
> db.createUser({
  user: "flinkuser",
  pwd: "flinkpw",
  roles: [
    { role: "read", db: "admin" }, //read role includes changeStream privilege 
    { role: "readAnyDatabase", db: "admin" } //for snapshot reading
  ]
});

4、创建maven工程

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.penngo.flinkcdc</groupId>
  <artifactId>FlickCDC</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>FlickCDC_TEST</name>
  <url>https://21doc.net/</url>
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <maven.compiler.source>11</maven.compiler.source>
    <maven.compiler.target>11</maven.compiler.target>
    <flink-version>1.13.3</flink-version>
    <flink-cdc-version>2.1.1</flink-cdc-version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-java</artifactId>
      <version>${flink-version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-base</artifactId>
      <version>${flink-version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-java_2.12</artifactId>
      <version>${flink-version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-clients_2.12</artifactId>
      <version>${flink-version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-table-common</artifactId>
      <version>${flink-version}</version>
    </dependency>

    <dependency>
      <groupId>com.ververica</groupId>
      <artifactId>flink-connector-mysql-cdc</artifactId>
      <version>${flink-cdc-version}</version>
    </dependency>
    <dependency>
      <groupId>com.ververica</groupId>
      <artifactId>flink-connector-mongodb-cdc</artifactId>
      <version>${flink-cdc-version}</version>
    </dependency>
  </dependencies>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.1</version>
        <configuration>
          <source>${maven.compiler.source}</source>
          <target>${maven.compiler.target}</target>
          <encoding>${project.build.sourceEncoding}</encoding>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <repositories>
    <repository>
      <id>alimaven</id>
      <name>Maven Aliyun Mirror</name>
      <url>https://maven.aliyun.com/repository/central</url>
    </repository>
  </repositories>
</project>

MongoDBExample.java

package com.penngo.flinkcdc;

import com.ververica.cdc.connectors.mongodb.MongoDBSource;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.apache.commons.lang3.StringEscapeUtils;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.util.Collector;

public class MongoDBExample {
    public static void main(String[] args) throws Exception{

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        //2.通过FlinkCDC构建SourceFunction
        SourceFunction<String> mongoDBSourceFunction = MongoDBSource.<String>builder()
                .hosts("127.0.0.1:27017")
                .username("flinkuser")
                .password("flinkpw")
                .database("penngo_db")
                .collection("coll")
//                .databaseList("penngo_db")
//                .collectionList("coll")
                .deserializer(new JsonDebeziumDeserializationSchema())
                .build();

        DataStreamSource<String> dataStreamSource = env.addSource(mongoDBSourceFunction);

        SingleOutputStreamOperator<Object> singleOutputStreamOperator = dataStreamSource.process(new ProcessFunction<String, Object>() {
            @Override
            public void processElement(String value, ProcessFunction<String, Object>.Context ctx, Collector<Object> out) {
                try {
                    System.out.println("processElement=====" + value);
                }catch (Exception e) {
                    e.printStackTrace();
                }
            }
        });

        dataStreamSource.print("原始流--");
        env.execute("Mongo");
    }
}

运行效果
在这里插入图片描述

5、项目源码

附件源码

  • 4
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 21
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 21
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

penngo

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值