kafka-jdbc-connector-sink实现kafka中的数据同步到mysql

  • 这是一篇关于通过mysql主从复制的方式进行数据同步的教程: https://zixuephp.net/article-438.html

  • 接下来笔者要描述的是mysql的数据通过kafka,然后在实时进入其他mysql数据库的方案

  • 有同学可能会问到为什么这么麻烦,而不直接使用主从复制的方案来解决mysql的数据同步呢?原因是通过kafka connector可以做简单的数据过滤。

  • 由于kakfa connctor只能做简单的数据过滤,之后可能会使用mysql + kafka + flink的形式实现数据同步

  • kafka只用dbz connector获取mysql中的数据,具体操作细节不是本文重点,在此不在赘述,后续会补上关于dbz操作mysql数据库的文章

kafka中数据同步到mysql

需求

1. 把kafka中dbz产生的数据同步到mysql中并修改表的名字
2. 将某一topic中的数据按着不同的字段组合以及定义不同字段为主键将数据同步到不同的表中
* 暂时不考虑支持自动创建表和删除操作,因为一般生产环境中只有软删除

实现

需求1:
    dbz中的数据是before-after的格式,需要将after中的数据提取出来同步到mysql中。
    通过正则表达式读取topic-name, 将每个topic的名字添加上前缀作为mysql的表名字。
需求2:
    topic中的value值不可以为null,对于过滤的字段必须存在topic的struct中。

工具

kafka-jdbc-connector-sink.jar

配置

  • 通过正则读取dbz的topic,替换表名后写入mysql(启动一个connector操作多个topic,一个topci对应一张表)
curl -H "Content-Type:application/json" -X PUT -d '{

    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "connection.url": "jdbc:mysql://mysql-url:3306/db-name",
    "connection.user": "username",
    "connection.password": "password",

    "tasks.max": "1",

    "topics.regex": "dbz_alpha_test.commodity.(.*)",

    "auto.create": false, # 是否自动创建表
    "auto.evolve": false, # 是否支持alert语句
    "insert.mode": "upsert", # 导入数据的模式(insert upsert update)
    "batch.size": 3000, # 批量操作数据
    "delete.enabled": false, # 是否支持delete操作, 如果支持delete操作pk.mode必须为record_key
    "pk.mode": "record_key", 

    "transforms": "dropPrefix, AddPrefix, ExtractField",
    # 提取某个字段作为struct传输到下游
    "transforms.ExtractField.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
    "transforms.ExtractField.field": "after",
    # 过滤掉topic名字的前缀
    "transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.dropPrefix.regex":"dbz_alpha_test.commodity.(.*)",
    "transforms.dropPrefix.replacement":"$1",
    # 为topic名字添加前缀
    "transforms.AddPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.AddPrefix.regex" : ".*",
    "transforms.AddPrefix.replacement" : "new-pre_$0"

}' http://localhost:8083/connectors/jdbc-sink-mysql-name/config
  • 一个topic根据不同的列字段对应多个表(一个表对应一个connector)
curl -H "Content-Type:application/json" -X PUT -d '{

    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "connection.url": "jdbc:mysql://mysql-url:3306/alpha_part",
    "connection.user": "username",
    "connection.password": "password",

    "tasks.max": "1",

    "topics": "dbz_alpha_test.commodity.commodity_txgjdbc",
    "table.name.format": "table-name", # 指定表的名字

    "auto.create": false,
    "auto.evolve": false,
    "insert.mode": "upsert",
    "batch.size": 3000,
    "delete.enabled": false,
    "pk.mode": "record_key",

    "transforms": "ExtractField, ValueToKey, ReplaceField, RenameField",
    
    "transforms.ExtractField.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
    "transforms.ExtractField.field": "after",
    # 将value中的某些field作为key
    "transforms.ValueToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
    "transforms.ValueToKey.fields":"age",
    # 过滤出value中需要的字段(whitelist,blacklist 黑白名单)
    "transforms.ReplaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
    "transforms.ReplaceField.whitelist": "name",
    # 修改key中某些字段的名字
    "transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Key",
    "transforms.RenameField.renames": "age:id,
    # 修改value中某些字段的名字
    "transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
    "transforms.RenameField.renames": "name:names"
    
}' http://localhost:8083/connectors/jdbc-sink-mysql_name/config

问题(注意)

  • 使用"transforms": "tombstoneHandlerExample报找不到io.confluent.connect.transforms.TombstoneHandler的jar包
  • 使用"transforms":"dropPrefix"之后自动创建表的功能失效了
  • 原始数据的after字段下的${topic}-Values在通过"connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector"输出到控制后不见了
Struct{
    before=Struct{id=240,create_time=Sun Oct 20 23: 11: 50 CST 2019,update_time=Sun Oct 20 23: 11: 50 CST 2019,is_deleted=0,origin_effect_id=0,cn_name=紧致,platform=3,effect_additional_fields=
    },
    after=Struct{id=240,create_time=Sun Oct 20 23: 11: 50 CST 2019,update_time=Sun Oct 20 23: 11: 50 CST 2019,is_deleted=0,origin_effect_id=45,cn_name=紧致,platform=3,effect_additional_fields=
    },
    source=Struct{version=0.10.0.CR1,connector=mysql,name=dbz_alpha_test,ts_ms=1574145235000,snapshot=false,db=commodity,table=commodity_effect,server_id=106507,gtid=d4832372-720c-11e9-92fa-6c0b84d610b3: 8022222,file=mysql-bin.000071,pos=107643337,row=0,thread=65745185
    },op=u,ts_ms=1574145235978
}
  • 当插入数据的时候,kafka中的key值会覆盖value中对应字段的值(“pk.mode”: "record_key"时)
  • insert.mode = upsert时,需要指定主键即pk.mode != null
  • transforms字段如果设置多个,那么他们之间的转换没有关系的即上游转换的结果不会传到下游
    "transforms": "ExtractField", # [1]
    "transforms.ExtractField.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
    "transforms.ExtractField.field": "after",
    "transforms": "ValueToKey", # [2]
    "transforms.ValueToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
    "transforms.ValueToKey.fields":"age",
发布了346 篇原创文章 · 获赞 33 · 访问量 10万+
展开阅读全文

使用kafka connect ,mysql作为输入,输出也是mysql 报错record value schema is missing

10-31

使用kafka connect ,mysql作为输入,输出也是mysql 报错record value schema is missing 报错代码如下: [2019-10-31 14:37:32,956] ERROR WorkerSinkTask{id=mysql-sink-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. (org.apache.kafka.connect.runtime.WorkerSinkTask:544) org.apache.kafka.connect.errors.ConnectException: PK mode for table 'dim_channel_copy' is RECORD_VALUE, but record value schema is missing     at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extractRecordValuePk(FieldsMetadata.java:238)     at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:102)     at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:64)     at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:71)     at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:66)     at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:69)     at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:524)     at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:302)     at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)     at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)     at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)     at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)     at java.lang.Thread.run(Thread.java:748) [2019-10-31 14:37:32,957] ERROR WorkerSinkTask{id=mysql-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172) org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.     at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)     at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:302)     at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)     at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)     at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)     at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)     at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.kafka.connect.errors.ConnectException: PK mode for table 'dim_channel_copy' is RECORD_VALUE, but record value schema is missing     at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extractRecordValuePk(FieldsMetadata.java:238)     at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:102)     at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:64)     at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:71)     at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:66)     at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:69)     at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:524)     ... 10 more [2019-10-31 14:37:32,958] ERROR WorkerSinkTask{id=mysql-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:173) [2019-10-31 14:37:32,958] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask:100) 问答

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 技术黑板 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览