canal数据增量同步Mysql8.0至ElasticSearch7(windows 本地环境)
canal数据增量同步Mysql8.0至ElasticSearch7(windows 本地环境)
截止笔者当前日期2021-10-29
canal的目前版本才更新至1.1.6
这次canal的版本准备使用v1.1.5 v1.1.5支持Mysql8.0 , ElasticSearch7.x
如果你使用mysql5.x ,elasticsearch6.x 建议使用v1.1.4
下面是搭建教程
1、(Mysql、ES)软件下载安装
ElasticSearch的安装配置,在我的这篇博文有详细写(点击前往)
Mysql8.0的安装配置(点击前往)
这里注意一下,安装好mysql8.0之后,应该是默认开启binlog的,查看一下binlog-format是否为row
mysql> show variables like 'binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
1 row in set, 1 warning (0.00 sec)
2、下载安装Canal
这是下载地址
https://github.com/alibaba/canal/releases
下载v1.1.5时注意下载这个版本,我之前没有下载alpha版本的,导致plugin目录下面的es的jar用不了
下载可能会有点慢,需要稍加等待
windows版本的话,上面两个压缩包,分别解压出来就算下载安装完毕了。
3、准备es及mysql的索引、表结构
先准备目前所需要的数据结构
准备es中的索引结构
PUT /rabbit-statis/
{
"mappings": {
"properties": {
"project_name":{
"type": "text"
},
"class_name":{
"type": "keyword"
},
"show_num":{
"type": "integer"
},
"refactor_path":{
"type": "text"
}
}
}
}
复制上面的代码,放入到kibana的控制台运行即可
接下来准备mysql的表结构
先建一个rabbit-statis2的数据库,再跑这个sql脚本
-- ----------------------------
-- Table structure for statis
-- ----------------------------
DROP TABLE IF EXISTS `statis`;
CREATE TABLE `statis` (
`id` int(0) NOT NULL AUTO_INCREMENT,
`project_name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL,
`class_name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL,
`show_num` int(0) NULL DEFAULT NULL,
`refactor_path` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
SET FOREIGN_KEY_CHECKS = 1;
4、配置canal-deploy
我们配置canal的目的,是为了让canal可以监听我们数据库的binlog日志,然后读取这些binlog,将数据同步到es。
接下来配置canal-deploy
我们先找到/deployer/conf/canal.properties这个文件,然后做如下修改
# tcp bind ip
canal.ip =127.0.0.1
然后找到/deployer/conf/example/instance.properties 这个配置文件,修改下面这些项目即可
# canal是伪装成mysql的slave,所以slaveId不能和其他mysql的slaveId冲突
canal.instance.mysql.slaveId=1234
# 连接数据库的ip和port、username、password
canal.instance.master.address=127.0.0.1:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
然后找到/deployer/bin/startup.bat windows环境,双击这个文件即可。canal就成功运行起来了。
如何确认canal-deployer成功运行起来呢?
找到/deployer/log/canal/canal.log文件
2021-10-29 12:33:25.300 [main] INFO com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[127.0.0.1(127.0.0.1):11111]
2021-10-29 12:33:26.977 [main] INFO com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......
找到/deployer/log/example/example.log文件
2021-10-29 11:11:40.568 [main] INFO c.a.otter.canal.instance.spring.CanalInstanceWithSpring - start CannalInstance for 1-example
2021-10-29 11:11:40.576 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
2021-10-29 11:11:40.576 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter : ^mysql\.slave_.*$
2021-10-29 11:11:40.626 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2021-10-29 11:11:40.638 [main] INFO c.a.otter.canal.instance.core.AbstractCanalInstance - start successful....
这样就说明启动成功。
5、配置canal-adapter
我们先找到/adapter/conf/application.yml这个文件,修改后的文件如下
server:
port: 8081
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8
default-property-inclusion: non_null
canal.conf:
mode: tcp #tcp kafka rocketMQ rabbitMQ
flatMessage: true
zookeeperHosts:
syncBatchSize: 1000
retries: 0
timeout:
accessKey:
secretKey:
consumerProperties:
# canal tcp consumer
canal.tcp.server.host: 127.0.0.1:11111
canal.tcp.zookeeper.hosts:
canal.tcp.batch.size: 500
canal.tcp.username:
canal.tcp.password:
# kafka consumer
kafka.bootstrap.servers: 127.0.0.1:9092
kafka.enable.auto.commit: false
kafka.auto.commit.interval.ms: 1000
kafka.auto.offset.reset: latest
kafka.request.timeout.ms: 40000
kafka.session.timeout.ms: 30000
kafka.isolation.level: read_committed
kafka.max.poll.records: 1000
# rocketMQ consumer
rocketmq.namespace:
rocketmq.namesrv.addr: 127.0.0.1:9876
rocketmq.batch.size: 1000
rocketmq.enable.message.trace: false
rocketmq.customized.trace.topic:
rocketmq.access.channel:
rocketmq.subscribe.filter:
# rabbitMQ consumer
rabbitmq.host:
rabbitmq.virtual.host:
rabbitmq.username:
rabbitmq.password:
rabbitmq.resource.ownerId:
srcDataSources:
defaultDS:
url: jdbc:mysql://localhost:3306/rabbit-statis2?serverTimezone=GMT%2B8
username: canal
password: canal
canalAdapters:
- instance: example # canal instance Name or mq topic name
groups:
- groupId: g1
outerAdapters:
- name: logger
- name: es7
key: exampleKey
hosts: 127.0.0.1:9300
properties:
mode: transport
# security.auth: test:123456 # only used for rest mode
cluster.name: my-application
# - name: kudu
# key: kudu
# properties:
# kudu.master.address: 127.0.0.1 # ',' split multi address
然后找到/adapter/conf/es7/mytest_user这个文件,修改后的文件如下
dataSourceKey: defaultDS
outerAdapterKey: exampleKey # 对应application.yml中es配置的key
destination: example
groupId: g1
esMapping:
_index: rabbit-statis
_id: _id
upsert: true
sql: "select id as _id,project_name,class_name,show_num,refactor_path from statis"
commitBatch: 3000