序言
很多文章都是基于standalone模式运行,这里使用的是distribute模式运行,通过kafka-connect-oracle来实时同步oracle数据,再写入到kafka消息队列中。
一、oracle打开归档日志
按照下面命令依次执行即可
[root@ums02 ~]# su - oracle
[oracle@ums02 ~]$ sqlplus / as sysdba
SQL>shutdown immediate
SQL>startup mount
SQL>alter database archivelog;
SQL>alter database open;
SQL>alter database add supplemental log data (all) columns;
# In order to execute connector successfully, connector must be started with privileged Oracle user. If given user has DBA role, this step can be skipped. Otherwise, the following scripts need to be executed to create a privileged user:
# 为了成功执行连接器,连接器必须由Oracle特权用户启动。如果给定用户具有数据库DBA角色,则可以跳过此步骤。否则,需要执行以下脚本来创建特权用户。
# 也就是说在启动kafka-connector的时候,需要给定的用户是具有dba角色的
# 创建具有dba权限的用户
SQL> create user kafka identified by kafka;
SQL> grant connect,resource,dba to kafka;
# 登录kafka用户
SQL> conn kafka/kafka;
SQL> create table student(id int,name char(25));
二、配置connector
1.、下载Kafka Connect Oracle GitHub链接
2.、通过IDEA工具打包,这里不多描述
3、 进入 /usr/software/kafka/kafka_2.12-2.4.0/config
vi connect-distributed.properties
# connect-distributed.properties 配置如下
bootstrap.servers=node1:9092,node2:9092,node3:9092
group.id=oracle-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.storage.replication.factor=2
offset.storage.partitions=12
config.storage.topic=connect-configs
config.storage.replication.factor=2
status.storage.topic=connect-status
status.storage.replication.factor=2
status.storage.partitions=6
offset.flush.interval.ms=10000
rest.port=18083
plugin.path=/usr/software/kafka/connectors
# 创建上面的三个topic
kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --create --topic connect-offsets --replication-factor 2 --partitions 12
kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --create --topic connect-configs --replication-factor 2 --partitions 1
kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --create --topic connect-status --replication-factor 2 --partitions 6
4、进入 /usr/software/kafka/connectors/kafka-connect-oracle
将打包好的 kafka-connect-oracle-1.0.71.jar、以及项目目录lib下的ojdbc7.jar上传至该目录
注意:不需要额外将 jar包放入 $KAFKA_HOME/libs 下,否则在启动connector的时候会报错
三、启动connector
cd /usr/software/kafka/kafka_2.12-2.4.0/config/
connect-distributed.sh ./connect-distributed.properties
# 通过命令行启动连接器
curl -s -X POST -H "Content-Type: application/json" --data '{
"name": "oracle-kafka-connector",
"config": {
"connector.class": "com.ecer.kafka.connect.oracle.OracleSourceConnector",
"tasks.max": "1",
"db.name.alias": "test",
"topic": "",
"db.name": "orcl",
"db.hostname": "192.168.2.45",
"db.port": "1521",
"db.user": "kafka",
"db.user.password": "kafka",
"db.fetch.size": "1",
"table.whitelist": "KAFKA.*",
"table.blacklist": "",
"parse.dml.data": "true",
"reset.offset": "false",
"multitenant": "false"
}
}' http://node1:18083/connectors | jq
# 如果没有jq命令
# 安装EPEL源
yum install epel-release
# 查看下jq包是否存在:
yum list jq
# 安装jq
yum install jq
# 查看连接器状态
[root@node1 config]# curl -s node1:18083/connectors/oracle-kafka-connector/status | jq
{
"name": "oracle-kafka-connector",
"connector": {
"state": "RUNNING",
"worker_id": "192.168.2.170:18083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "192.168.2.170:18083"
}
],
"type": "source"
}
四、同步oracle数据到kafka
# 此时可以看到是可以正常运行的,也可以查看到kafka中会生成一个topic,topic名为 TEST.KAFKA.STUDENT
kafka-topics.sh --list --bootstrap-server node1:9092,node2:9092,node3:9092
# 消费该topic
kafka-console-consumer.sh --topic TEST.KAFKA.STUDENT --bootstrap-server node1:9092,node2:9092,node3:9092 --from-beginning | jq
# 到oracle中往表插入数据
SQL> insert into student values(1,'zhangsan');
1 row created.
# 需commit提交事务
SQL> commit;
Commit complete.
如有问题,欢迎一起交流讨论。