canal介绍
canal简介
笔者最近在研究大数据方面的技术,用到了canal这个组件,研究心得写篇博客记录如下:
什么是canal?canal能干什么?canal工作原理?canal主要组成部分?如何搭建cananl环境?
1、什么是canal?
根据官网介绍,canal译意为水道/管道/沟渠,主要用途是基于 MySQL 数据库增量日志解析,提供增量数据订阅和消费。
基于日志增量订阅和消费的业务包括
- 数据库镜像
- 数据库实时备份
- 索引构建和实时维护(拆分异构索引、倒排索引等)
- 业务 cache 刷新
- 带业务逻辑的增量数据处理
2、canal工作原理?
- canal 模拟 MySQL slave 的交互协议,伪装自己为 MySQL slave ,向 MySQL master 发送 dump 协议
- MySQL master 收到 dump 请求,开始推送 binary log 给 slave (即 canal )
- canal 解析 binary log 对象(原始为 byte 流)
3、canal主要组成部分?
说明:
- server代表一个canal运行实例,对应于一个jvm
- instance对应于一个数据队列 (1个server对应1…n个instance)
instance模块:
- eventParser (数据源接入,模拟slave协议和master进行交互,协议解析)
- eventSink (Parser和Store链接器,进行数据过滤,加工,分发的工作)
- eventStore (数据存储)
- metaManager (增量订阅&消费信息管理器)
mysql的BinLog
1、什么是mysql的binlog日志?
canal服务伪装成mysql的从节点,接收mysql主节点的binlog日志(binlog日志文件里面记录了数据库的实时操作),然后解析binlog就知道mysql做了哪些操作。
canal单机版环境搭建
环境准备
0、环境准备
- centos7系统
- mysql8.0.20
- canal1.1.4
1、mysql安装参考博客:mysql8.0.20
2、canal安装包:
2.1 官网下载
2.2 百度云下载:
链接:https://pan.baidu.com/s/1FVkeyY8M6KFr3bVARWjz_g
提取码:odqh
3、mysql配置:
下面展示一些 内联代码片
。
vim /etc/my.cnf
此文件下添加如下配置
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复
4.登录mysql:
**#是否启用了日志 NO表示开启 OFF表示关闭**
mysql> show variables like 'log_bin';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin | ON |
+---------------+-------+
1 row in set (0.00 sec)
**#当前的日志**
mysql> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000004 | 1797 | | | |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
**#mysql binlog模式**
mysql> show variables like 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
1 row in set (0.00 sec)
**#获取binlog文件列表**
mysql> show binary logs;
+---------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000001 | 2335 | No |
| binlog.000002 | 566 | No |
| binlog.000003 | 4332 | No |
| binlog.000004 | 1797 | No |
+---------------+-----------+-----------+
4 rows in set (0.11 sec)
**#当前正在写入的binlog文件**
mysql> show master status\G
*************************** 1. row ***************************
File: binlog.000004
Position: 1797
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
**#指定binlog文件的内容**
mysql> show binlog events in 'binlog.000004';
[root@master ~]# mysql -uroot -proot #我的mysql登录账号密码是root
**#添加canal mysql数据库账号**
mysql> CREATE USER canal IDENTIFIED BY 'canal';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
mysql> FLUSH PRIVILEGES;
搭建Canal
1、将下载好的canal服务上传之centos系统;
2、在 /tmp 目录下创建文件,文件在重启后会被自动删除
[root@master example]# mkdir /tmp/canal
[root@master example]# tar zxvf canal.deployer-1.1.4.tar.gz
3、修改配置
# table meta tsdb info 数据库账号密码
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal
4、启动canal
进入bin目录,执行命令:
[root@master bin]# ./startup.sh
[root@master example]# cat /opt/canal/canal.deployer-1.1.4/logs/example/example.log
搭建canal-client
1、Maven pom.xml文件添加
<dependency>
<groupId>com.alibaba.otter</groupId>
<artifactId>canal.client</artifactId>
<version>1.1.4</version>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.6.1</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty</artifactId>
<version>1.10.0</version>
</dependency>
2、下面展示一些 客户端代码CanalClient.java
。
package com.lixiang.canal;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.common.utils.AddressUtils;
import com.alibaba.otter.canal.protocol.Message;
import com.alibaba.otter.canal.protocol.CanalEntry.Column;
import com.alibaba.otter.canal.protocol.CanalEntry.Entry;
import com.alibaba.otter.canal.protocol.CanalEntry.EntryType;
import com.alibaba.otter.canal.protocol.CanalEntry.EventType;
import com.alibaba.otter.canal.protocol.CanalEntry.RowChange;
import com.alibaba.otter.canal.protocol.CanalEntry.RowData;
import java.net.InetSocketAddress;
import java.util.List;
public class CanalClient {
public static void main(String args[]) {
// 创建链接
CanalConnector connector = CanalConnectors.newSingleConnector(new InetSocketAddress("192.168.43.110",
11111), "example", "", "");
int batchSize = 1;
int emptyCount = 0;
try {
connector.connect();
connector.subscribe(".*\\..*");
connector.rollback();
while (true) {
Message message = connector.getWithoutAck(batchSize); // 获取指定数量的数据
long batchId = message.getId();
int size = message.getEntries().size();
if (batchId == -1 || size == 0) {
emptyCount++;
System.out.println("empty count : " + emptyCount);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
}
} else {
emptyCount = 0;
// System.out.printf("message[batchId=%s,size=%s] \n", batchId, size);
printEntry(message.getEntries());
}
connector.ack(batchId); // 提交确认
// connector.rollback(batchId); // 处理失败, 回滚数据
}
} finally {
connector.disconnect();
}
}
private static void printEntry(List<Entry> entrys) {
for (Entry entry : entrys) {
if (entry.getEntryType() == EntryType.TRANSACTIONBEGIN || entry.getEntryType() == EntryType.TRANSACTIONEND) {
continue;
}
RowChange rowChage = null;
try {
rowChage = RowChange.parseFrom(entry.getStoreValue());
} catch (Exception e) {
throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry.toString(),
e);
}
EventType eventType = rowChage.getEventType();
System.out.println(String.format("================> binlog[%s:%s] , name[%s,%s] , eventType : %s",
entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(),
entry.getHeader().getSchemaName(), entry.getHeader().getTableName(),
eventType));
for (RowData rowData : rowChage.getRowDatasList()) {
if (eventType == EventType.DELETE) {
printColumn(rowData.getBeforeColumnsList());
} else if (eventType == EventType.INSERT) {
printColumn(rowData.getAfterColumnsList());
} else {
System.out.println("-------> before");
printColumn(rowData.getBeforeColumnsList());
System.out.println("-------> after");
printColumn(rowData.getAfterColumnsList());
}
}
}
}
private static void printColumn(List<Column> columns) {
for (Column column : columns) {
System.out.println(column.getName() + " : " + column.getValue() + " update=" + column.getUpdated());
}
}
}
3、启动canal服务:[root@master bin]# ./startup.sh
4、创建数据库,表,插入记录
# 创建数据库
create database employee;
use test;
# 创建表
create table t_user (
id int (4) primary key not null auto_increment,
name varchar(10) not null);
# 插入记录
insert into test (name) values('lixiang');
IDEA打印日志如下:
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
================> binlog[binlog.000004:1874] , name[employee,t_user] , eventType : ERASE
================> binlog[binlog.000004:2092] , name[employee,t_user] , eventType : CREATE
empty count : 1
empty count : 2
empty count : 3
empty count : 4
================> binlog[binlog.000004:2510] , name[employee,t_user] , eventType : INSERT
id : 1 update=true
name : lixiang update=true
empty count : 1
empty count : 2
empty count : 3
empty count : 4
empty count : 5
empty count : 6
canalHA版环境搭建
1、机器准备:
mysql:192.168.43.110:3306
canal server:192.168.43.110:11111 、192.168.43.111:11111
zookeeper:192.168.43.110:2181、192.168.43.111:2181、192.168.43.112:2181
2、按照部署和配置,在单台机器上各自完成配置,演示时instance name为example
① 修改主机:192.168.43.110 的canal.properties,加上zookeeper配置,spring配置选择default-instance.xml
canal.zkServers =192.168.43.110:2181
canal.instance.global.spring.xml = classpath:spring/default-instance.xml
② example目录,并修改instance.properties
canal.instance.mysql.slaveId=1234 ##另外一台机器改成1235,保证slaveId不重复即可
canal.instance.master.address=192.168.43.110:3306
注意: 两台机器上的instance目录的名字需要保证完全一致,HA模式是依赖于instance name进行管理,同时必须都选择default-instance.xml配置
3、启动两台机器的canal
启动后,你可以查看logs/example/example.log,只会看到一台机器上出现了启动成功的日志。
查看一下zookeeper中的节点信息,也可以知道当前工作的节点为192.168.43.110:11111
[zk: 192.168.43.110:2181(CONNECTED) 13] get /otter/canal/destinations/example/running
{"active":true,"address":"192.168.43.110:11111"}
cZxid = 0x900000011
ctime = Wed Jun 24 01:22:33 CST 2020
mZxid = 0x900000011
mtime = Wed Jun 24 01:22:33 CST 2020
pZxid = 0x900000011
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x10006b8a4ff0000
dataLength = 48
numChildren = 0
4、客户端链接, 消费数据
可以直接指定zookeeper地址和instance name,canal client会自动从zookeeper中的running节点,获取当前服务的工作节点,然后与其建立链接
// 创建链接
CanalConnector connector =
CanalConnectors.newClusterConnector("192.168.43.110:2181,192.168.43.111:2181", "example", "", "");
链接成功后,canal server会记录当前正在工作的canal client信息,比如客户端ip,链接的端口信息等
[zk: 192.168.43.110:2181(CONNECTED) 16] get /otter/canal/destinations/example/1001/running
{"active":true,"address":"192.168.43.95:56148","clientId":1001}
cZxid = 0x900000045
ctime = Wed Jun 24 02:48:17 CST 2020
mZxid = 0x900000046
mtime = Wed Jun 24 02:48:17 CST 2020
pZxid = 0x900000045
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x20006b87dc80000
dataLength = 63
numChildren = 0
数据消费成功后,canal server会在zookeeper中记录下当前最后一次消费成功的binlog位点. (下次你重启client时,会从这最后一个位点继续进行消费)
[zk: 192.168.43.110:2181(CONNECTED) 17] get /otter/canal/destinations/example/1001/cursor
{"@type":"com.alibaba.otter.canal.protocol.position.LogPosition","identity":{"slaveId":-1,"sourceAddress":{"address":"master","port":3306}},"postion":{"gtid":"","included":false,"journalName":"binlog.000004","position":2558,"serverId":1,"timestamp":1592926713000}}
cZxid = 0x900000023
ctime = Wed Jun 24 02:33:48 CST 2020
mZxid = 0x90000003a
mtime = Wed Jun 24 02:45:53 CST 2020
pZxid = 0x900000023
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 264
numChildren = 0
5、停止正在工作的192.168.43.110的canal server
这时192.168.43.111会立马启动example instance,提供新的数据服务
[zk: 192.168.43.110:2181(CONNECTED) 18] get /otter/canal/destinations/example/running
{"active":true,"address":"192.168.43.111:11111"}
cZxid = 0x900000030
ctime = Wed Jun 24 02:45:35 CST 2020
mZxid = 0x900000030
mtime = Wed Jun 24 02:45:35 CST 2020
pZxid = 0x900000030
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x10006b8a4ff0002
dataLength = 48
numChildren = 0
与此同时,客户端也会随着canal server的切换,通过获取zookeeper中的最新地址,与新的canal server建立链接,继续消费数据,整个过程自动完成