canal 简介


canal 简介

             

官网:https://github.com/alibaba/canal/wiki

                 

              

************************

简 介

                 

canal是解析增量日志(binary log)的中间件,可实现增量数据订阅与消费,目前主要支持mysql

             

主要应用:数据复制、缓存双写一致性

               

******************

工作原理

               

                             

canal 模拟mysql复制过程,将自己伪装成mysql slave,向master返送dump请求

master收到dump请求后,向canal推送binary log;

canal解析binary log并存储,供客户端订阅

                

******************

mysql主从复制

                

                   

master开启二进制日志(默认不开启),记录二进制日志事件;

slave向master发送dump请求,master向slave推送数据;

slave将数据写入中继日志(relay log),解析relay log,在本地执行

                         

查看二进制日志事件

mysql> show binary logs;
+-------------------------+-----------+-----------+
| Log_name                | File_size | Encrypted |
+-------------------------+-----------+-----------+
| 13b18152a242-bin.000001 |       156 | No        |
| 13b18152a242-bin.000002 |       436 | No        |
+-------------------------+-----------+-----------+
2 rows in set (0.00 sec)


# 默认显示第一个二进制日志中的事件
mysql> show binlog events;
+-------------------------+-----+----------------+-----------+-------------+-----------------------------------+
| Log_name                | Pos | Event_type     | Server_id | End_log_pos | Info                              |
+-------------------------+-----+----------------+-----------+-------------+-----------------------------------+
| 13b18152a242-bin.000001 |   4 | Format_desc    |         1 |         125 | Server ver: 8.0.25, Binlog ver: 4 |
| 13b18152a242-bin.000001 | 125 | Previous_gtids |         1 |         156 |                                   |
+-------------------------+-----+----------------+-----------+-------------+-----------------------------------+
2 rows in set (0.00 sec)


# 显示指定二进制日志中的事件
mysql> show binlog events in "13b18152a242-bin.000001";
+-------------------------+-----+----------------+-----------+-------------+-----------------------------------+
| Log_name                | Pos | Event_type     | Server_id | End_log_pos | Info                              |
+-------------------------+-----+----------------+-----------+-------------+-----------------------------------+
| 13b18152a242-bin.000001 |   4 | Format_desc    |         1 |         125 | Server ver: 8.0.25, Binlog ver: 4 |
| 13b18152a242-bin.000001 | 125 | Previous_gtids |         1 |         156 |                                   |
+-------------------------+-----+----------------+-----------+-------------+-----------------------------------+
2 rows in set (0.00 sec)

mysql> show binlog events in "13b18152a242-bin.000002";
+-------------------------+-----+----------------+-----------+-------------+--------------------------------------+
| Log_name                | Pos | Event_type     | Server_id | End_log_pos | Info                                 |
+-------------------------+-----+----------------+-----------+-------------+--------------------------------------+
| 13b18152a242-bin.000002 |   4 | Format_desc    |         1 |         125 | Server ver: 8.0.25, Binlog ver: 4    |
| 13b18152a242-bin.000002 | 125 | Previous_gtids |         1 |         156 |                                      |
| 13b18152a242-bin.000002 | 156 | Anonymous_Gtid |         1 |         235 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS' |
| 13b18152a242-bin.000002 | 235 | Query          |         1 |         310 | BEGIN                                |
| 13b18152a242-bin.000002 | 310 | Table_map      |         1 |         361 | table_id: 88 (test.test)             |
| 13b18152a242-bin.000002 | 361 | Write_rows     |         1 |         405 | table_id: 88 flags: STMT_END_F       |
| 13b18152a242-bin.000002 | 405 | Xid            |         1 |         436 | COMMIT /* xid=15 */                  |
+-------------------------+-----+----------------+-----------+-------------+--------------------------------------+
7 rows in set (0.00 sec)

               

                  

************************

canal 架构

                
                  

                      

server表示canal实例对应一个jvm进程;

instance对应一个数据队列:连接mysql,对数据进行处理转换,供客户端订阅;

一个server可包含多个instance队列,同时处理多个mysql数据源

                   

******************

instance 组成

                  

eventParser:数据源接入,模拟slave与mysql进行交互,解析数据

eventSink:数据转换处理

eventStore:数据存储(存储在内存),后续计划支持本地file存储

MetaDataManager:元数据管理器,存储订阅信息等数据

                     

******************

eventParser 设计

                     

                  

canal server与mysql建立连接时,获取上一次解析的位置,如果初次启动,则获取初始指定位置或者mysql master的当前位置

                     

canal server发送binlog_dump指令

0. write command number
1. write 4 bytes bin-log position to start at
2. write 2 bytes bin-log flags
3. write 4 bytes server id of the slave
4. write bin-log file name

                     

mysql向canal server推送binlog数据;

                     

canal server使用binlog_parser解析binlog,补充特定信息,推送给eventSink;

eventSink对数据进行处理转换,交给eventStore存储(存储为阻塞操作)

存储成功后,定时记录binary log位置

                     

******************

eventSink 设计

                     

                  

数据过滤:支持通配符过滤,查找指定的数据库、表等数据

路由分发:mysql数据库实例可创建多个数据库,按照数据库分类存储

数据归并:业务中数据量过大需要分库分表,使用canal将分散的数据集中存储

                     

数据加工:数据存储之前进行额外处理,如

  sql: "select a.id as _id, a.name as _name, a.role_id as _role_id, b.role_name as _role_name,
        a.c_time as _c_time, c.labels as _labels from user a
        left join role b on b.id=a.role_id
        left join (select user_id, group_concat(label order by id desc separator ';') as labels from label
        group by user_id) c on c.user_id=a.id"

                     

******************

eventStore 设计

                     

eventStore目前仅支持内存存储,未来会引入本地file存储、混合存储

                                   

数据存储在内存,大小固定,为2^n

put 指针:数据最后一次写入位置

get 指针:客户端最后一次拉取数据位置

ack 指针:客户端最后一次消费成功提交位置

说明:客户端get数据时,使用与操作(cursor&(size-1)),因此ringSize为2^n时效率较高

                     

将ringBuffer拉直,put、get、ack满足如下关系

                     

ack <= get <= put

put -ack <= ringBuffer.size

说明:put、get、ack指针位置用long类型记录;

                     

******************

instance 配置管理

                     

manager 方式:可视化界面配置canal参数

spring 方式:本地文件spring xml、properties配置canal参数

                     

                   

                     

instance.xml作用:创建CanalInstanceWithSpring实例对象

                     

memory-instance.xml:使用内存存储parser log position、cursor

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tx="http://www.springframework.org/schema/tx"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:lang="http://www.springframework.org/schema/lang"
	xmlns:context="http://www.springframework.org/schema/context"
	xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
           http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
           http://www.springframework.org/schema/lang http://www.springframework.org/schema/lang/spring-lang-2.0.xsd
           http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd
           http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd"
	default-autowire="byName">

	<import resource="classpath:spring/base-instance.xml" />


# 创建CanalInstanceWithSpring实例对象
	<bean id="instance" class="com.alibaba.otter.canal.instance.spring.CanalInstanceWithSpring">
		<property name="destination" value="${canal.instance.destination}" />
		<property name="eventParser">
			<ref bean="eventParser" />
		</property>
		<property name="eventSink">
			<ref bean="eventSink" />
		</property>
		<property name="eventStore">
			<ref bean="eventStore" />
		</property>
		<property name="metaManager">
			<ref bean="metaManager" />
		</property>
		<property name="alarmHandler">
			<ref bean="alarmHandler" />
		</property>
        <property name="mqConfig">
            <ref bean="mqConfig" />
        </property>
	</bean>

	<!-- 报警处理类 -->
	<bean id="alarmHandler" class="com.alibaba.otter.canal.common.alarm.LogAlarmHandler" />


# 元数据管理器使用内存存储
	<bean id="metaManager" class="com.alibaba.otter.canal.meta.MemoryMetaManager" />

# eventStore使用内存存储数据 
	<bean id="eventStore" class="com.alibaba.otter.canal.store.memory.MemoryEventStoreWithBuffer">
		<property name="bufferSize" value="${canal.instance.memory.buffer.size:16384}" />
		<property name="bufferMemUnit" value="${canal.instance.memory.buffer.memunit:1024}" />
		<property name="batchMode" value="${canal.instance.memory.batch.mode:MEMSIZE}" />
		<property name="ddlIsolation" value="${canal.instance.get.ddl.isolation:false}" />
		<property name="raw" value="${canal.instance.memory.rawEntry:true}" />
	</bean>

	<bean id="eventSink" class="com.alibaba.otter.canal.sink.entry.EntryEventSink">
		<property name="eventStore" ref="eventStore" />
		<property name="filterTransactionEntry" value="${canal.instance.filter.transaction.entry:false}"/>
	</bean>

	<bean id="eventParser" parent="baseEventParser">
		<property name="destination" value="${canal.instance.destination}" />
		<property name="slaveId" value="${canal.instance.mysql.slaveId:0}" />
		<!-- 心跳配置 -->
		<property name="detectingEnable" value="${canal.instance.detecting.enable:false}" />
		<property name="detectingSQL" value="${canal.instance.detecting.sql}" />
		<property name="detectingIntervalInSeconds" value="${canal.instance.detecting.interval.time:5}" />
		<property name="haController">
			<bean class="com.alibaba.otter.canal.parse.ha.HeartBeatHAController">
				<property name="detectingRetryTimes" value="${canal.instance.detecting.retry.threshold:3}" />
				<property name="switchEnable" value="${canal.instance.detecting.heartbeatHaEnable:false}" />
			</bean>
		</property>

		<property name="alarmHandler" ref="alarmHandler" />

		<!-- 解析过滤处理 -->
		<property name="eventFilter">
			<bean class="com.alibaba.otter.canal.filter.aviater.AviaterRegexFilter" >
				<constructor-arg index="0" value="${canal.instance.filter.regex:.*\..*}" />
			</bean>
		</property>

		<property name="eventBlackFilter">
			<bean class="com.alibaba.otter.canal.filter.aviater.AviaterRegexFilter" >
				<constructor-arg index="0" value="${canal.instance.filter.black.regex:}" />
				<constructor-arg index="1" value="false" />
			</bean>
		</property>
		
		<property name="fieldFilter" value="${canal.instance.filter.field}" />
		<property name="fieldBlackFilter" value="${canal.instance.filter.black.field}" />
		
		<!-- 最大事务解析大小,超过该大小后事务将被切分为多个事务投递 -->
		<property name="transactionSize" value="${canal.instance.transaction.size:1024}" />

		<!-- 网络链接参数 -->
		<property name="receiveBufferSize" value="${canal.instance.network.receiveBufferSize:16384}" />
		<property name="sendBufferSize" value="${canal.instance.network.sendBufferSize:16384}" />
		<property name="defaultConnectionTimeoutInSeconds" value="${canal.instance.network.soTimeout:30}" />

		<!-- 解析编码 -->
		<!-- property name="connectionCharsetNumber" value="${canal.instance.connectionCharsetNumber:33}" /-->
		<property name="connectionCharset" value="${canal.instance.connectionCharset:UTF-8}" />


# 除初次启动外,后续从内存中记录的上次解析位置开始解析
		<!-- 解析位点记录 -->
		<property name="logPositionManager">
			<bean class="com.alibaba.otter.canal.parse.index.MemoryLogPositionManager" />
		</property>


		<!-- failover切换时回退的时间 -->
		<property name="fallbackIntervalInSeconds" value="${canal.instance.fallbackIntervalInSeconds:60}" />

		<!-- 解析数据库信息 -->
		<property name="masterInfo">
			<bean class="com.alibaba.otter.canal.parse.support.AuthenticationInfo" init-method="initPwd">
				<property name="address" value="${canal.instance.master.address}" />
				<property name="username" value="${canal.instance.dbUsername:retl}" />
				<property name="password" value="${canal.instance.dbPassword:retl}" />
				<property name="pwdPublicKey" value="${canal.instance.pwdPublicKey:retl}" />
				<property name="enableDruid" value="${canal.instance.enableDruid:false}" />
				<property name="defaultDatabaseName" value="${canal.instance.defaultDatabaseName:}" />
			</bean>
		</property>
		<property name="standbyInfo">
			<bean class="com.alibaba.otter.canal.parse.support.AuthenticationInfo" init-method="initPwd">
				<property name="address" value="${canal.instance.standby.address}" />
				<property name="username" value="${canal.instance.dbUsername:retl}" />
				<property name="password" value="${canal.instance.dbPassword:retl}" />
				<property name="pwdPublicKey" value="${canal.instance.pwdPublicKey:retl}" />
				<property name="enableDruid" value="${canal.instance.enableDruid:false}" />
				<property name="defaultDatabaseName" value="${canal.instance.defaultDatabaseName:}" />
			</bean>
		</property>


# 初次启动时,eventParser解析的起始位置:从配置参数中读取
		<!-- 解析起始位点 -->
		<property name="masterPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.master.journal.name}" />
				<property name="position" value="${canal.instance.master.position}" />
				<property name="timestamp" value="${canal.instance.master.timestamp}" />
				<property name="gtid" value="${canal.instance.master.gtid}" />
			</bean>
		</property>
		<property name="standbyPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.standby.journal.name}" />
				<property name="position" value="${canal.instance.standby.position}" />
				<property name="timestamp" value="${canal.instance.standby.timestamp}" />
				<property name="gtid" value="${canal.instance.standby.gtid}" />
			</bean>
		</property>


		<property name="filterQueryDml" value="${canal.instance.filter.query.dml:false}" />
		<property name="filterQueryDcl" value="${canal.instance.filter.query.dcl:false}" />
		<property name="filterQueryDdl" value="${canal.instance.filter.query.ddl:false}" />
		<property name="useDruidDdlFilter" value="${canal.instance.filter.druid.ddl:true}" />
		<property name="filterDmlInsert" value="${canal.instance.filter.dml.insert:false}" />
		<property name="filterDmlUpdate" value="${canal.instance.filter.dml.update:false}" />
		<property name="filterDmlDelete" value="${canal.instance.filter.dml.delete:false}" />
		<property name="filterRows" value="${canal.instance.filter.rows:false}" />
		<property name="filterTableError" value="${canal.instance.filter.table.error:false}" />
		<property name="supportBinlogFormats" value="${canal.instance.binlog.format}" />
		<property name="supportBinlogImages" value="${canal.instance.binlog.image}" />

		<!--表结构相关-->
		<property name="enableTsdb" value="${canal.instance.tsdb.enable:false}"/>
		<property name="tsdbSpringXml" value="${canal.instance.tsdb.spring.xml:}"/>
		<property name="tsdbSnapshotInterval" value="${canal.instance.tsdb.snapshot.interval:24}" />
		<property name="tsdbSnapshotExpire" value="${canal.instance.tsdb.snapshot.expire:360}" />

		<!--是否启用GTID模式-->
		<property name="isGTIDMode" value="${canal.instance.gtidon:false}"/>

		<!-- parallel parser -->
		<property name="parallel" value="${canal.instance.parser.parallel:true}" />
		<property name="parallelThreadSize" value="${canal.instance.parser.parallelThreadSize}" />
		<property name="parallelBufferSize" value="${canal.instance.parser.parallelBufferSize:256}" />

		<property name="autoResetLatestPosMode" value="${canal.auto.reset.latest.pos.mode:false}" />
	</bean>

	<bean id="mqConfig" class="com.alibaba.otter.canal.instance.core.CanalMQConfig">
		<property name="topic" value="${canal.mq.topic}" />
		<property name="dynamicTopic" value="${canal.mq.dynamicTopic}" />
		<property name="partition" value="${canal.mq.partition}" />
		<property name="partitionsNum" value="${canal.mq.partitionsNum}" />
		<property name="partitionHash" value="${canal.mq.partitionHash}" />
		<property name="dynamicTopicPartitionNum" value="${canal.mq.dynamicTopicPartitionNum}" />
	</bean>
</beans>

                     

file-instance.xml:混合使用内存、本地文件存储parser log position、cursor信息

#其余省略

# metaManager使用本地文件存储
	<bean id="metaManager" class="com.alibaba.otter.canal.meta.FileMixedMetaManager">
		<property name="dataDir" value="${canal.file.data.dir:../conf}" />
		<property name="period" value="${canal.file.flush.period:1000}" />
	</bean>

# eventStore使用内存存储
	<bean id="eventStore" class="com.alibaba.otter.canal.store.memory.MemoryEventStoreWithBuffer">
		<property name="bufferSize" value="${canal.instance.memory.buffer.size:16384}" />
		<property name="bufferMemUnit" value="${canal.instance.memory.buffer.memunit:1024}" />
		<property name="batchMode" value="${canal.instance.memory.batch.mode:MEMSIZE}" />
		<property name="ddlIsolation" value="${canal.instance.get.ddl.isolation:false}" />
		<property name="raw" value="${canal.instance.memory.rawEntry:true}" />
	</bean>

	<bean id="eventSink" class="com.alibaba.otter.canal.sink.entry.EntryEventSink">
		<property name="eventStore" ref="eventStore" />
		<property name="filterTransactionEntry" value="${canal.instance.filter.transaction.entry:false}"/>
	</bean>


# eventParser实例对象
	<bean id="eventParser" parent="baseEventParser">

# 优先使用使用memoryLogPositionManager查找解析位置
# 若查找不到,则从FileMixedMetaManager获取解析位置
		<!-- 解析位点记录 -->
		<property name="logPositionManager">
			<bean class="com.alibaba.otter.canal.parse.index.FailbackLogPositionManager">
				<constructor-arg>
					<bean class="com.alibaba.otter.canal.parse.index.MemoryLogPositionManager" />
				</constructor-arg>
				<constructor-arg>
					<bean class="com.alibaba.otter.canal.parse.index.MetaLogPositionManager">
						<constructor-arg ref="metaManager"/>
					</bean>
				</constructor-arg>
			</bean>
		</property>

# 初次启动时,eventParser解析的日志位置
		<!-- 解析起始位点 -->
		<property name="masterPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.master.journal.name}" />
				<property name="position" value="${canal.instance.master.position}" />
				<property name="timestamp" value="${canal.instance.master.timestamp}" />
				<property name="gtid" value="${canal.instance.master.gtid}" />
			</bean>
		</property>
		<property name="standbyPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.standby.journal.name}" />
				<property name="position" value="${canal.instance.standby.position}" />
				<property name="timestamp" value="${canal.instance.standby.timestamp}" />
				<property name="gtid" value="${canal.instance.standby.gtid}" />
			</bean>
		</property>
	</bean>

                     

default-instance.xml:混合使用内存、zookeeper存储log position、cursor信息

# 其余省略

# zookeeper客户端
	<bean id="zkClientx" class="org.springframework.beans.factory.config.MethodInvokingFactoryBean" >
		<property name="targetClass" value="com.alibaba.otter.canal.common.zookeeper.ZkClientx" />
		<property name="targetMethod" value="getZkClient" />
		<property name="arguments">
			<list>
				<value>${canal.zkServers:127.0.0.1:2181}</value>
			</list>
		</property>
	</bean>


# 使用zookeeper存储元数据
	<bean id="metaManager" class="com.alibaba.otter.canal.meta.PeriodMixedMetaManager">
		<property name="zooKeeperMetaManager">
			<bean class="com.alibaba.otter.canal.meta.ZooKeeperMetaManager">
				<property name="zkClientx" ref="zkClientx" />
			</bean>
		</property>
		<property name="period" value="${canal.zookeeper.flush.period:1000}" />
	</bean>


# eventStore使用内存存储数据
	<bean id="eventStore" class="com.alibaba.otter.canal.store.memory.MemoryEventStoreWithBuffer">
		<property name="bufferSize" value="${canal.instance.memory.buffer.size:16384}" />
		<property name="bufferMemUnit" value="${canal.instance.memory.buffer.memunit:1024}" />
		<property name="batchMode" value="${canal.instance.memory.batch.mode:MEMSIZE}" />
		<property name="ddlIsolation" value="${canal.instance.get.ddl.isolation:false}" />
		<property name="raw" value="${canal.instance.memory.rawEntry:true}" />
	</bean>

	<bean id="eventSink" class="com.alibaba.otter.canal.sink.entry.EntryEventSink">
		<property name="eventStore" ref="eventStore" />
		<property name="filterTransactionEntry" value="${canal.instance.filter.transaction.entry:false}"/>
	</bean>


# eventParser实例
	<bean id="eventParser" parent="baseEventParser" >

# 优先在内存中查找parser log position信息
# 若查找不到,则在zookeeper中查找
		<!-- 解析位点记录 -->
		<property name="logPositionManager">
			<bean class="com.alibaba.otter.canal.parse.index.FailbackLogPositionManager">
				<constructor-arg>
					<bean class="com.alibaba.otter.canal.parse.index.MemoryLogPositionManager" />
				</constructor-arg>
				<constructor-arg>
					<bean class="com.alibaba.otter.canal.parse.index.MetaLogPositionManager">
						<constructor-arg ref="metaManager"/>
					</bean>
				</constructor-arg>
			</bean>
		</property>

# 初次启动是解析的log position
		<!-- 解析起始位点 -->
		<property name="masterPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.master.journal.name}" />
				<property name="position" value="${canal.instance.master.position}" />
				<property name="timestamp" value="${canal.instance.master.timestamp}" />
				<property name="gtid" value="${canal.instance.master.gtid}" />
			</bean>
		</property>
		<property name="standbyPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.standby.journal.name}" />
				<property name="position" value="${canal.instance.standby.position}" />
				<property name="timestamp" value="${canal.instance.standby.timestamp}" />
				<property name="gtid" value="${canal.instance.standby.gtid}" />
			</bean>
		</property>

	</bean>

                     

group-instance.xml:使用内存存储元数据信息,将多个parser整合为一个parser(n -> 1)

# 其余省略

# 使用内存存储元数据
	<bean id="metaManager" class="com.alibaba.otter.canal.meta.MemoryMetaManager" />


# eventStore使用内存存储数据
	<bean id="eventStore" class="com.alibaba.otter.canal.store.memory.MemoryEventStoreWithBuffer">
		<property name="bufferSize" value="${canal.instance.memory.buffer.size:16384}" />
		<property name="bufferMemUnit" value="${canal.instance.memory.buffer.memunit:1024}" />
		<property name="batchMode" value="${canal.instance.memory.batch.mode:MEMSIZE}" />
		<property name="ddlIsolation" value="${canal.instance.get.ddl.isolation:false}" />
		<property name="raw" value="${canal.instance.memory.rawEntry:true}" />
	</bean>


	<bean id="eventSink" class="com.alibaba.otter.canal.sink.entry.EntryEventSink">
		<property name="eventStore" ref="eventStore" />
		<property name="filterTransactionEntry" value="${canal.instance.filter.transaction.entry:false}"/>
	</bean>


# eventParser将eventParser1、eventParser2整合为1个parser
	<bean id="eventParser" class="com.alibaba.otter.canal.parse.inbound.group.GroupEventParser">
		<property name="eventParsers">
			<list>
				<ref bean="eventParser1" />
				<ref bean="eventParser2" />
			</list>
		</property>
	</bean>

# eventParser1
	<bean id="eventParser1" parent="baseEventParser">

# 使用内存存储元数据
		<!-- 解析位点记录 -->
		<property name="logPositionManager">
			<bean class="com.alibaba.otter.canal.parse.index.MemoryLogPositionManager" />
		</property>

# 初次启动读取的log position
		<!-- 解析起始位点 -->
		<property name="masterPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.master1.journal.name}" />
				<property name="position" value="${canal.instance.master1.position}" />
				<property name="timestamp" value="${canal.instance.master1.timestamp}" />
				<property name="gtid" value="${canal.instance.master1.gtid}" />
			</bean>
		</property>
		<property name="standbyPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.standby1.journal.name}" />
				<property name="position" value="${canal.instance.standby1.position}" />
				<property name="timestamp" value="${canal.instance.standby1.timestamp}" />
				<property name="gtid" value="${canal.instance.standby1.gtid}" />
			</bean>
		</property>
   </bean>

# eventParser2
	<bean id="eventParser2" parent="baseEventParser">

# 使用内存存储log position
		<!-- 解析位点记录 -->
		<property name="logPositionManager">
			<bean class="com.alibaba.otter.canal.parse.index.MemoryLogPositionManager" />
		</property>

# 初次启动读取的log position
		<!-- 解析起始位点 -->
		<property name="masterPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.master2.journal.name}" />
				<property name="position" value="${canal.instance.master2.position}" />
				<property name="timestamp" value="${canal.instance.master2.timestamp}" />
				<property name="gtid" value="${canal.instance.master2.gtid}" />
			</bean>
		</property>
		<property name="standbyPosition">
			<bean class="com.alibaba.otter.canal.protocol.position.EntryPosition">
				<property name="journalName" value="${canal.instance.standby2.journal.name}" />
				<property name="position" value="${canal.instance.standby2.position}" />
				<property name="timestamp" value="${canal.instance.standby2.timestamp}" />
				<property name="gtid" value="${canal.instance.standby2.gtid}" />
			</bean>
		</property>

	</bean>

                     

******************

相关类与接口

                     

MemoryMetaManager:内存存储cursor元数据(put、get、ack cursor)

public class MemoryMetaManager extends AbstractCanalLifeCycle implements CanalMetaManager {
    protected Map<String, List<ClientIdentity>> destinations;
    protected Map<ClientIdentity, MemoryMetaManager.MemoryClientIdentityBatch> batches;
    protected Map<ClientIdentity, Position> cursors;

    public MemoryMetaManager() {
    }

                     

MemoryLogPositionmanager:内存存储eventParser解析的日志点位

public class MemoryLogPositionManager extends AbstractLogPositionManager {
    private Map<String, LogPosition> positions;

    public MemoryLogPositionManager() {
    }

                     

CanalInstanceWithSpring

public class CanalInstanceWithSpring extends AbstractCanalInstance {
    private static final Logger logger = LoggerFactory.getLogger(CanalInstanceWithSpring.class);

    public CanalInstanceWithSpring() {
    }

    public void start() {
        logger.info("start CannalInstance for {}-{} ", new Object[]{1, this.destination});
        super.start();
    }

    public void setEventParser(CanalEventParser eventParser) {
    public void setEventSink(CanalEventSink<List<Entry>> eventSink) {
    public void setEventStore(CanalEventStore<Event> eventStore) {

    public void setDestination(String destination) {
    public void setMetaManager(CanalMetaManager metaManager) {
    public void setAlarmHandler(CanalAlarmHandler alarmHandler) {

    public void setMqConfig(CanalMQConfig mqConfig) {

                     

AbstractCanalInstance

public class AbstractCanalInstance extends AbstractCanalLifeCycle implements CanalInstance {
    private static final Logger logger = LoggerFactory.getLogger(AbstractCanalInstance.class);
    protected Long canalId;
    protected String destination;
    protected CanalEventStore<Event> eventStore;
    protected CanalEventParser eventParser;
    protected CanalEventSink<List<Entry>> eventSink;
    protected CanalMetaManager metaManager;
    protected CanalAlarmHandler alarmHandler;
    protected CanalMQConfig mqConfig;

    public AbstractCanalInstance() {
    }

                     

                     

************************

canal 客户端消费订阅

                     

                                  

                     

get:先ack数据,然后返回数据

public class SimpleCanalConnector implements CanalConnector {

# get:客户端先ack,然后返回数据
    public Message get(int batchSize) throws CanalClientException {
        return this.get(batchSize, (Long)null, (TimeUnit)null);
    }

    public Message get(int batchSize, Long timeout, TimeUnit unit) throws CanalClientException {
        Message message = this.getWithoutAck(batchSize, timeout, unit);
        this.ack(message.getId());   //先ack,然后返回数据
        return message;
    }


# ClusterCanalConnector调用SimpleCanalConnector获取数据

                     

getWithoutAck:直接获取数据,异步ack

rollback(batchId):回滚数据,重新获取

ack(batchId):客户端消费成功,通知server删除数据

                     

******************

get、ack异步好处

                     

                     

减少ack操作带来的网络延时

可以不停轮询get数据,提高并行化

                     

******************

cursor 设计

                     

get:生成一个mark,mark递增,保证全局唯一性

每次get时,都从上一次get的mark后拉取数据,如果mark不存在,则从last ack mark后获取数据

                     

ack:按照mark的顺序处理,不能跳跃ack

确认后,将last ack mark修改为当前mark,删除当前mark

                     

rollback:删除所有get mark,下次get 数据时,从last ack mark处拉取数据

                     

******************

相关类与接口

                     

CanalConnector

public interface CanalConnector {

    void connect() throws CanalClientException;
    void disconnect() throws CanalClientException;

    boolean checkValid() throws CanalClientException;

    void subscribe(String var1) throws CanalClientException;
    void subscribe() throws CanalClientException;
    void unsubscribe() throws CanalClientException;

    Message get(int var1) throws CanalClientException;
    Message get(int var1, Long var2, TimeUnit var3) throws CanalClientException;

    Message getWithoutAck(int var1) throws CanalClientException;
    Message getWithoutAck(int var1, Long var2, TimeUnit var3) throws CanalClientException;

    void ack(long var1) throws CanalClientException;

    void rollback(long var1) throws CanalClientException;
    void rollback() throws CanalClientException;
}

                     

CanalConnectors:创建客户端连接方式

public class CanalConnectors {
    public CanalConnectors() {
    }

    public static CanalConnector newSingleConnector(SocketAddress address, String destination, String username, String password) {
                                 //单机连接
        SimpleCanalConnector canalConnector = new SimpleCanalConnector(address, username, password, destination);
        canalConnector.setSoTimeout(60000);
        canalConnector.setIdleTimeout(3600000);
        return canalConnector;
    }

    public static CanalConnector newClusterConnector(List<? extends SocketAddress> addresses, String destination, String username, String password) {
                                 //客户端直接连接canal server集群,支持canal server fail over
        ClusterCanalConnector canalConnector = new ClusterCanalConnector(username, password, destination, new SimpleNodeAccessStrategy(addresses));
        canalConnector.setSoTimeout(60000);
        canalConnector.setIdleTimeout(3600000);
        return canalConnector;
    }

    public static CanalConnector newClusterConnector(String zkServers, String destination, String username, String password) {
                                  //客户端连接zookeeper,从zookeeper处获取canal server地址,
                                  //支持canal server、canal client fail over
        ClusterCanalConnector canalConnector = new ClusterCanalConnector(username, password, destination, new ClusterNodeAccessStrategy(destination, ZkClientx.getZkClient(zkServers)));
        canalConnector.setSoTimeout(60000);
        canalConnector.setIdleTimeout(3600000);
        return canalConnector;
    }
}

                     

SimpleCanalConnector:单机连接

public class SimpleCanalConnector implements CanalConnector {

# get:客户端先ack,然后返回数据
    public Message get(int batchSize) throws CanalClientException {
        return this.get(batchSize, (Long)null, (TimeUnit)null);
    }

    public Message get(int batchSize, Long timeout, TimeUnit unit) throws CanalClientException {
        Message message = this.getWithoutAck(batchSize, timeout, unit);
        this.ack(message.getId());   //先ack,然后返回数据
        return message;
    }


# getWithoutAck:客户端直接返回数据
    public Message getWithoutAck(int batchSize) throws CanalClientException {
    public Message getWithoutAck(int batchSize, Long timeout, TimeUnit unit) throws CanalClientException {

# ack数据
    public void ack(long batchId) throws CanalClientException {

# rollback回滚拉取的数据
    public void rollback(long batchId) throws CanalClientException {
    public void rollback() throws CanalClientException {

                     

ClusterCanalConnector:集群连接

public class ClusterCanalConnector implements CanalConnector {
    private final Logger logger = LoggerFactory.getLogger(this.getClass());
    private String username;
    private String password;
    private int soTimeout = 60000;
    private int idleTimeout = 3600000;
    private int retryTimes = 3;
    private int retryInterval = 5000;
    private CanalNodeAccessStrategy accessStrategy;
    private SimpleCanalConnector currentConnector;       //内部调用SimpleCalalConnector连接
    private String destination;
    private String filter;

    public ClusterCanalConnector(String username, String password, String destination, CanalNodeAccessStrategy accessStrategy) {
        this.username = username;
        this.password = password;
        this.destination = destination;
        this.accessStrategy = accessStrategy;
    }

    public void connect() throws CanalClientException {
        while(this.currentConnector == null) {
            int times = 0;

            while(true) {
                try {
                    this.currentConnector = new SimpleCanalConnector((SocketAddress)null, this.username, this.password, this.destination) {
                        public SocketAddress getNextAddress() {
                            return ClusterCanalConnector.this.accessStrategy.nextNode();
                        }
                    };
                    this.currentConnector.setSoTimeout(this.soTimeout);
                    this.currentConnector.setIdleTimeout(this.idleTimeout);
                    if (this.filter != null) {
                        this.currentConnector.setFilter(this.filter);
                    }

                    if (this.accessStrategy instanceof ClusterNodeAccessStrategy) {
                        this.currentConnector.setZkClientx(((ClusterNodeAccessStrategy)this.accessStrategy).getZkClient());
                    }

                    this.currentConnector.connect();
                    break;
                } catch (Exception var5) {
                    this.logger.warn("failed to connect to:{} after retry {} times", this.accessStrategy.currentNode(), times);
                    this.currentConnector.disconnect();
                    this.currentConnector = null;
                    ++times;
                    if (times >= this.retryTimes) {
                        throw new CanalClientException(var5);
                    }

                    try {
                        Thread.sleep((long)this.retryInterval);
                    } catch (InterruptedException var4) {
                        throw new CanalClientException(var4);
                    }
                }
            }
        }

    }


    public Message get(int batchSize) throws CanalClientException {
        int times = 0;

        while(times < this.retryTimes) {
            try {
                Message msg = this.currentConnector.get(batchSize);
                              //调用SimpleCanalConnector get方法获取数据,
                              //该方法获取数据时需先ack,然后返回数据
                return msg;
            } catch (Throwable var4) {
                this.logger.warn(String.format("something goes wrong when getting data from server:%s", this.currentConnector != null ? this.currentConnector.getAddress() : "null"), var4);
                ++times;
                this.restart();
                this.logger.info("restart the connector for next round retry.");
            }
        }

        throw new CanalClientException("failed to fetch the data after " + times + " times retry");
    }

    public Message get(int batchSize, Long timeout, TimeUnit unit) throws CanalClientException {
        int times = 0;

        while(times < this.retryTimes) {
            try {
                Message msg = this.currentConnector.get(batchSize, timeout, unit);
                return msg;
            } catch (Throwable var6) {
                this.logger.warn(String.format("something goes wrong when getting data from server:%s", this.currentConnector != null ? this.currentConnector.getAddress() : "null"), var6);
                ++times;
                this.restart();
                this.logger.info("restart the connector for next round retry.");
            }
        }

        throw new CanalClientException("failed to fetch the data after " + times + " times retry");
    }

    public Message getWithoutAck(int batchSize) throws CanalClientException {
        int times = 0;

        while(times < this.retryTimes) {
            try {
                Message msg = this.currentConnector.getWithoutAck(batchSize);
                              //调用SImpleCanalConnector getWithoutAck获取数据
                              //该方法直接返回数据,异步ack
                return msg;
            } catch (Throwable var4) {
                this.logger.warn(String.format("something goes wrong when getWithoutAck data from server:%s", this.currentConnector != null ? this.currentConnector.getAddress() : "null"), var4);
                ++times;
                this.restart();
                this.logger.info("restart the connector for next round retry.");
            }
        }

        throw new CanalClientException("failed to fetch the data after " + times + " times retry");
    }

    public Message getWithoutAck(int batchSize, Long timeout, TimeUnit unit) throws CanalClientException {
        int times = 0;

        while(times < this.retryTimes) {
            try {
                Message msg = this.currentConnector.getWithoutAck(batchSize, timeout, unit);
                return msg;
            } catch (Throwable var6) {
                this.logger.warn(String.format("something goes wrong when getWithoutAck data from server:%s", this.currentConnector != null ? this.currentConnector.getAddress() : "null"), var6);
                ++times;
                this.restart();
                this.logger.info("restart the connector for next round retry.");
            }
        }

        throw new CanalClientException("failed to fetch the data after " + times + " times retry");
    }

    public void rollback(long batchId) throws CanalClientException {
    public void rollback() throws CanalClientException {

    public void ack(long batchId) throws CanalClientException {

                     

Message:客户端获取的数据

public class Message implements Serializable {
    private static final long serialVersionUID = 1234034768477580009L;
    private long id;                  //数据批次唯一标识 batch id
    private List<Entry> entries = new ArrayList();
    private boolean raw = true;       //默认为byteString
    private List<ByteString> rawEntries = new ArrayList();

    public Message(long id, List<Entry> entries) {
        this.id = id;
        this.entries = (List)(entries == null ? new ArrayList() : entries);
        this.raw = false;
    }

    public Message(long id, boolean raw, List entries) {
        this.id = id;
        if (raw) {
            this.rawEntries = (List)(entries == null ? new ArrayList() : entries);
        } else {
            this.entries = (List)(entries == null ? new ArrayList() : entries);
        }

        this.raw = raw;
    }

                     

                     

************************

高可用

                     

canal 高可用:canal server高可用、canal client高可用

canal server:不同的canal server上的instance对同一mysql请求数据,为减少对mysql的dump请求,同一时间只能有一个instance处于运行状态,其余处于standby状态

canal client:为保证有序性,同一时间一个instance只能被一个canal client消费,其余client处于standby状态

                     

******************

canal server 实现原理

                     

                      

canal server启动时,在zookeeper创建临时节点,

若创建成功则运行,否则处于standby状态

zookeeper会记录处于运行状态的canal server信息

                     

当处于运行状态的canal server故障,临时节点在会话超时后消失,

此时,处于standby状态的server得到通知,尝试创建临时节点,临时节点创建成功,则运行

                     

canal client启动时,会从zookeeper处获取处于运行状态的canal server,和其建立连接

如果canal server不可用,连接断开,会重新尝试建立连接

                     

******************

触发 failover 场景

                     

正常关闭:canal server释放instance资源、删除running的canal servr节点

平滑切换:在zookeeper中将处于运行状态的canal zerver的活跃状态修改为false,canal server收到信息后,主动释放instance资源,但不退出jvm

                     

canal server jvm异常:canal server在会话失效后,释放instance资源,session timeout默认为40s

canal server网络中断(假死状态),导致zookeeper认为canal server故障,触发failover

                     

canal server瞬间时效处理(假死状态)

目的:避免因瞬间runing失效导致instance重新分布

处理方法:
canal server在收到running节点释放后,延迟一段时间抢占running,
原本running节点的拥有者可以不需要等待延迟,优先取得running节点,
可以保证假死状态下尽可能不无谓的释放资源。 

延时时间设置:默认值为5秒,即running节点针对假死状态的保护期为5秒

                     

************************

TableMeta TSDB 设计

                     

背景:DDL语句会改变表的结构,早期canal版本在处理DDL时,直接使用内存中维护的当前数据库内表的结构,这种处理方法会遇到一些问题

  

             

可能问题:处理的binlog时间为T0,当前时间为T1,T0<T1

T0 ~ T1,发生增减column的语句(alter table add column/drop column),
在解析T0的binlog时,使用T1的表结构,会出现列不匹配的情况,报如下错误:
column size is not match for table: xx , 12 vs 13

T0 ~ T1,发生增加column、删除column,数据列总数一样,但操作的数据列不是对应的数据列

T0 ~ T1,发生drop table,导致无法找到对应的数据表(not found [xx] in db),binlog处理阻塞

                        

解决方法:基于druid的ddl处理能力,构建动态的表结构

         

每条建表语句传入druid的SchemaRepository.console(),构建一份druid的初始表结构

之后在收到每条DDL变更时,把alter table add/drop column等,全部传递给druid,
由druid识别ddl语句并在内存里执行具体的add/drop column的行为,维护一份最终的表结构

定时把druid的内存表结构,做一份checkpoint,
之后的位点回溯,以checkpoint + 增量DDL重放的方式来快速构建任意时间点的表结构

                       

内存表存储

本地存储(h2):使用内嵌的数据库h2,存储在本地(默认)
中心存储(MySQL):将内存表快照存储在mysql中,canal server共享mysql,canal server集群部署时使用


**************
本地存储配置(h2)

#canal.properties
canal.instance.tsdb.spring.xml=classpath:spring/tsdb/h2-tsdb.xml

#instance.properties
canal.instance.tsdb.enable=true
canal.instance.tsdb.dir=${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url=jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
canal.instance.tsdb.dbUsername=canal
canal.instance.tsdb.dbPassword=canal


**************
中心存储配置(mysql)

#canal.properties
canal.instance.tsdb.spring.xml=classpath:spring/tsdb/mysql-tsdb.xml

#instance.properties
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://******:3306/canal_tsdb
canal.instance.tsdb.dbUsername=canal
canal.instance.tsdb.dbPassword=canal

               

          

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值