kafka连接器

最新推荐文章于 2024-08-17 23:09:09 发布

wtdig

最新推荐文章于 2024-08-17 23:09:09 发布

阅读量1k

点赞数

文章标签： kafka

本文链接：https://blog.csdn.net/yuyedewutong/article/details/106918533

版权

一、简介

Kafka Connect是一种用于在Kafka和其他系统之间可扩展的、可靠的流式传输数据的工具。它使得能够快速定义将大量数据集合移入和移出Kafka的连接器变得简单。 Kafka Connect可以获取整个数据库或从所有应用程序服务器收集指标到Kafka主题，使数据可用于低延迟的流处理。导出作业可以将数据从Kafka topic传输到二次存储和查询系统，或者传递到批处理系统以进行离线分析。

二、kafka连接器集群环境搭建

1、下载kafka连接器

目前比较好的连接器实现为，confluent出品的连接器；
Confluent是一家创业公司，由当时编写Kafka的几位程序员从Linked In公司离职后创立的，Confluent Platform 就是Confluent公司的主要产品，其平台实现主要依赖的就是Kafka。

官网地址

2、配置文件

连接器的版本信息与kafka版本对应的信息
参考网站：https://docs.confluent.io/current/installation/versions-interoperability.html

1）、集群模式的配置文件

bootstrap.servers=heng-042:9092,heng-043:9092,heng-044:9092
 
# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster
 
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
 
# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
#offset.storage.partitions=25
 
# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
config.storage.topic=connect-configs
config.storage.replication.factor=1
 
# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
status.storage.topic=connect-status
status.storage.replication.factor=1
#status.storage.partitions=5
 
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000
 
# These are provided to inform the user about the presence of the REST host and port configs
# Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.
#rest.host.name=
rest.port=8083
 
# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
# 此处的两个端口本人在测试时使用的是一直的在分布式下可以自动转发
rest.advertised.host.name=heng-042
rest.advertised.port=8083
 
# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
#plugin.path=

注意：要配置的work节点都修改该配置

3、启动集群服务

1）、启动指令
connect-distributed.sh -daemon /data/kafka/config/connect-distributed.properties
注意：启动Kafak Connect（在每台的work节点都要启动）

4、Api创建连接器

1）、常见连接器指令
curl -X POST -H “Content-Type: application/json”
–data ‘{“name”:“mysql-source-person”,“config”:{“connector.class”:“io.confluent.connect.jdbc.JdbcSourceConnector”,“connection.url”:“jdbc:mysql://172.16.93.130:3306/kafka_connect”,“connection.user”:“root”,“connection.password”:“Wtdig1991,”,“value.converter.schemas.enable”:“true”,“value.converter”:“org.apache.kafka.connect.json.JsonConverter”,“topic.prefix”:“mysqltest.”,“tasks.max”:“1”,“table.whitelist”:“connect_test_url”,“mode”:“incrementing”,“incrementing.column.name”:“id”}}’
http://172.16.93.130:8083/connectors

注意：只有集群模式，才可以使用curl的命令创建连接器