1.mysql配置
canal的原理是基于mysql binlog技术,所以这里一定需要开启mysql的binlog写入功能,建议配置binlog模式为row
(1).修改mysql的配置文件:
#开启binlog log-bin=mysql-bin #选择row模式 binlog-format=ROW #配置mysql replaction需要定义,不能和canal的slaveId重复 server_id=1 |
(2).canal的原理是模拟自己为mysql slave,所以这里一定需要做为mysql slave的相关权限.
CREATE USER canal IDENTIFIED BY 'canal'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%'; GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ; FLUSH PRIVILEGES; |
2.配置canal
canal包下载地址:https://github.com/alibaba/canal/releases
(1).解压安装
这里使用canal-1.1.4版本:
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
mkdir /data/canal;cd/data/canal
tar xf canal.deployer-1.1.4.tar.gz -C ./
(2)修改配置
[root@localhost test]# grep '^[a-Z]' instance.properties
#数据库的地址+端口 canal.instance.master.address=192.168.1.11:3306 #canal的用户名 canal.instance.dbUsername=canal #canal的密码 canal.instance.dbPassword=canal #mysql连接时默认的库 canal.instance.defaultDatabaseName =test #mysql数据解析编码 canal.instance.connectionCharset = UTF-8 #正则匹配表 canal.instance.filter.regex=test.t1 #kafka的topic canal.mq.topic=canal #散列模式的分区数 canal.mq.partitionsNum=0 #散列规则定义(库名.表名 : 唯一主键,比如mytest.person: id) canal.mq.partitionHash=test.t1:id 注:canal.instance.filter.regex: 多个正则之间以逗号(,)分隔,转义符需要双斜杠(\\) 常见例子: 1. 所有表:.* or .*\\..* 2. canal schema下所有表: canal\\..* 3. canal下的以canal打头的表:canal\\.canal.* 4. canal schema下的一张表:canal.test1 5. 多个规则组合使用:canal\\..*,mysql.test1,mysql.test2 (逗号分隔) 注意:此过滤条件只针对row模式的数据有效(ps. mixed/statement因为不解析sql,所以无法准确提取tableName进行过滤) |
[root@localhost test]# grep '^[a-Z]' canal.properties
#canal-server连接zookeeper集群的连接信息 canal.zkServers =localhost:2181 #canal持久化数据到zookeeper上的更新频率,单位毫秒 canal.zookeeper.flush.period = 1000 #canal-server的模式 canal.serverMode = kafka #canal持久化数据到file上的目录(默认和instance.properties为同一目录) canal.file.data.dir = ${canal.conf.dir} #canal持久化数据到file上的更新频率,单位毫秒 canal.file.flush.period = 1000 #canal内存store中可缓存buffer记录数,需要为2的指数 canal.instance.memory.buffer.size = 16384 #内存记录的单位大小,默认1KB,和buffer.size组合决定最终的内存使用大小 canal.instance.memory.buffer.memunit = 1024 canal.instance.memory.batch.mode = MEMSIZE #是否开启心跳检查 canal.instance.detecting.enable = false #心跳检查sql canal.instance.detecting.sql = select 1 #心跳检查频率,单位秒 canal.instance.detecting.interval.time = 3 #心跳检查失败重试次数 canal.instance.detecting.retry.threshold = 3 #心跳检查失败后,是否开启自动mysql自动切换(说明:比如心跳检查失败超过阀值后,如果该配置为true,canal就会自动链到mysql备库获取binlog数据) canal.instance.detecting.heartbeatHaEnable = false canal.instance.filter.druid.ddl = true #是否忽略DCL的query语句,比如grant/create user等 canal.instance.filter.query.dcl = false #是否忽略DML的query语句 canal.instance.filter.query.dml = false #是否忽略DDL的query语句 canal.instance.filter.query.ddl = false #支持的binlog格式 canal.instance.binlog.format = ROW,STATEMENT,MIXED #canal的用户 canal.instance.tsdb.dbUsername = canal #canal的密码 canal.instance.tsdb.dbPassword = canal #配置开启的 instance(即example目录名) canal.destinations = example canal.conf.dir = ../conf #Kafka 的地址 canal.mq.servers = 192.168.1.12:6667 #发送失败重试次数 canal.mq.retries = 3 #值应该小于 Kafka 的 config/producer.properties 中 batch.size canal.mq.batchSize = 16384 #使用文本格式(JSON)进行传输,否则 Kafka 里扔进去的是二进制数据,虽然不影响,但是看起来不方便 canal.mq.flatMessage = true 注:canal.instance.memory.batch.mode = MEMSIZE #canal内存store中数据缓存模式 1. ITEMSIZE : 根据buffer.size进行限制,只限制记录的数量 2. MEMSIZE : 根据buffer.size * buffer.memunit的大小,限制缓存记录的大小 |
启动canal:
bin/startup.sh
查看日志是否启动:
cat logs/canal/canal.log
2013-02-05 22:45:27.967 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## start the canal server.
2013-02-05 22:45:28.113 [main] INFO com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[10.1.29.120:11111]
2013-02-05 22:45:28.210 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## the canal server is running now ......
上述情况表示启动成功。
测试消费:
首先去kafka上开启消费进程:
bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.12:6667 --from-beginning --topic canal
从数据库上添加一些数据:
USE test; CREATE TABLE IF NOT EXISTS test.omneo( pid int(11) NOT NULL AUTO_INCREMENT, uuid varchar(100) NOT NULL, firstname varchar(20) CHARACTER SET utf8 DEFAULT NULL, lastname varchar(20) CHARACTER SET utf8 DEFAULT NULL, birthdate varchar(20), postalcode varchar(20), city varchar(20), sexe varchar(20), status varchar(20), commenttime timestamp NOT NULL DEFAULT current_timestamp, PRIMARY KEY (pid) )ENGINE=InnoDB DEFAULT CHARSET=utf8; # 插入 4 条测试数据 insert into omneo values(1,"0049683542a7-5bdb-d564-3133-276ae3ce","Maurice","Samuel","01/11/1977","H2M2V5","Ballancourt","male","en couple","2020-05-09 11:01:54"); insert into omneo values(2,"8338623542a7-5bdb-d564-3133-276ae3ce","Gauthier","Garbuet","23/05/1965","05274","Cocagne","female","maried","2020-05-09 11:01:54"); insert into omneo values(3,"3374573542a7-5bdb-d564-3133-276ae3ce","Maurice","Samuel","01/11/1977","H0H0H0","Ottawa","male","en couple","2020-05-09 11:01:54"); insert into omneo values(4,"5494133542a7-5bdb-d564-3133-276ae3ce","Nicole","Garbuet","01/11/1977","H0H0H0","Maugio","unknown","single","2020-05-09 11:01:54"); # 更新测试数据 update omneo_incrementing_timestamp set firstname = "world" ,commenttime="2020-12-20 15:55:10" where pid in(2,4); # 删除测试数据 delete from omneo where pid = 1; |
此时应该已经可以列出创建表时的语句,出现即表明消费成功。