canal高可用和mysql节点自动切换的部署,可以参考我的这篇文章
1. 准备工作
1.1 active mysql查看
因为我们部署了mysql节点自动切换,我们需要查看现在同步的是哪一台mysql服务器
- 先查看example_db1实例运行的服务器
[zk: canal1:2181,canal2:2181,canal3:2181(CONNECTED) 4]
[zk: canal1:2181,canal2:2181,canal3:2181(CONNECTED) 4] get /otter/canal/destinations/example_db1/running
{"active":true,"address":"192.168.23.31:11111"}
[zk: canal1:2181,canal2:2181,canal3:2181(CONNECTED) 5]
可以看到example_db1实例运行在canal1上
- 然后我们查看canal1的example_db1日志
[root@canal1 example_db1]#
[root@canal1 example_db1]# pwd
/root/canal.deployer-1.1.5/logs/example_db1
[root@canal1 example_db1]#
[root@canal1 example_db1]# tail example_db1.log
......省略部分......
2021-07-29 04:16:13.461 [destination = example_db1 , address = canal2/192.168.23.32:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position mysql-bin.000026:4:1627458457000
2021-07-29 04:16:14.278 [destination = example_db1 , address = canal2/192.168.23.32:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-bin.000026,position=4,serverId=2,gtid=<null>,timestamp=1627458457000] cost : 877ms , the next step is binlog dump
[root@canal1 example_db1]#
可以看到当前active的mysql为canal2
1.2 mysql数据情况
[root@canal2 ~]# mysql -u root -pRoot_123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 8.0.25 MySQL Community Server - GPL
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| db1 |
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
6 rows in set (0.01 sec)
mysql> use db1;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+---------------+
| Tables_in_db1 |
+---------------+
| tb1_1 |
| tb1_2 |
+---------------+
2 rows in set (0.00 sec)
mysql> select * from tb1_1;
+------+-------+--------+
| idA | nameA | scoreA |
+------+-------+--------+
| 1 | 1 | 1.10 |
| 2 | 2 | 2.20 |
| 3 | 3 | 3.30 |
+------+-------+--------+
3 rows in set (0.00 sec)
mysql> select * from tb1_2;
+------+-------+------+
| idA | nameA | ageA |
+------+-------+------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
+------+-------+------+
3 rows in set (0.00 sec)
mysql>
1.3 全量数据导出
这里我们找一台临时的数据库服务器canal3, 数据同步脚本如下:
[root@canal3 ~]#
[root@canal3 ~]# pwd
/root
[root@canal3 ~]#
[root@canal3 ~]# cat mysql_dump_load.sh
#!/usr/bin/env bash
# 脚本参数
mysql_source_host=192.168.23.32
mysql_source_port=3306
mysql_source_user=root
mysql_source_password=Root_123
mysql_source_db=db1
mysql_target_host=192.168.23.33
mysql_target_port=3306
mysql_target_user=root
mysql_target_password=Root_123
mysql_target_db=db1
mysql_dump_path=/root/db1.sql
# 不导出视图的参数
ignore_tables="--ignore-table=${mysql_source_db}.tb1 --ignore-table=${mysql_source_db}.tb2"
# 导出数据库数据
mysqldump -h ${mysql_host} -P ${mysql_port} -u ${mysql_user} -p${mysql_password} --databases ${mysql_source_db} --flush-logs --lock-all-tables --master-data=1 --column_statistics=0 ${ignore_tables} > ${mysql_dump_path}
# 获取binlog文件名和位置,并在导出的sql文件中添加注释
mysql_binlog_filename=`head -10000 ${mysql_dump_path} | grep 'CHANGE MASTER TO MASTER_LOG_FILE' | sed -r 's/CHANGE MASTER TO MASTER_LOG_FILE.*(mysql-bin\.[0-9]+).*MASTER_LOG_POS.*/\1/g'`
mysql_binlog_position=`head -10000 ${mysql_dump_path} | grep 'CHANGE MASTER TO MASTER_LOG_FILE' | sed -r 's/CHANGE MASTER TO MASTER_LOG_FILE.*MASTER_LOG_POS=([0-9]+).*/\1/g'`
sed -ri 's/(CHANGE MASTER TO MASTER_LOG_FILE.*MASTER_LOG_POS.*;)/-- \1/g' ${mysql_dump_path}
mysql -h ${mysql_target_host} -P ${mysql_target_port} -u ${mysql_target_user} -p${mysql_target_password} -e "drop database if exists ${mysql_target_db}"
mysql -h ${mysql_target_host} -P ${mysql_target_port} -u ${mysql_target_user} -p${mysql_target_password} -e "source ${mysql_dump_path}"
echo ${mysql_binlog_filename}
echo ${mysql_binlog_position}
[root@canal3 ~]#
运行数据导出脚本
[root@canal3 ~]#
[root@canal3 ~]# sh mysql_dump_load.sh
mysqldump: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql-bin.000026
156
[root@canal3 ~]#
此时我们可以看到flush后的最新binlog文件是mysql-bin.000026,position是156
1.4 数据库的数据更新
然后我们再对db1数据库的数据做一些变更
[root@canal2 ~]# mysql -u root -pRoot_123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 8.0.25 MySQL Community Server - GPL
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> insert into db1.tb1_1 values(4,'4',4.4),(5,'5',5.5);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> delete from db1.tb1_2 where idA = 2;
Query OK, 1 rows affected (0.00 sec)
mysql> delete from db1.tb1_2 where idA = 3;
Query OK, 1 rows affected (0.00 sec)
mysql>
2. canal.deploy配置修改
修改3台服务器的配置
- canal.properties
修改部分:
canal.destinations =
canal.serverMode = tcp
这里我们取消默认的example, 改用自动扫描,会扫描conf目录下的文件夹名(除了metrics和spring)来做example实例, 一般一个example实例对应一个数据库的同步
然后重启canal服务
2.2 删除conf/example,建立新的example实例
在canal1的/root目录下准备一个example实例,步骤如下
[root@canal1 conf]# ls
canal_local.properties canal.properties example logback.xml metrics spring
[root@canal1 conf]#
[root@canal1 conf]# pwd
/root/canal.deployer-1.1.5/conf
[root@canal1 conf]#
[root@canal1 conf]# cp -r example/ ~
[root@canal1 conf]#
[root@canal1 conf]# cd ~
[root@canal1 ~]#
[root@canal1 ~]# mv example/ example_db1
[root@canal1 ~]#
[root@canal1 ~]# ll example_db1/
总用量 164
-rw-r--r--. 1 root root 163840 6月 25 11:01 h2.mv.db
-rwxr-xr-x. 1 root root 2621 6月 25 11:01 instance.properties
[root@canal1 ~]#
[root@canal1 ~]# rm example_db1/h2.mv.db -rf
[root@canal1 ~]#
example_db1/instance.properties设置了binlog的读取起始点,且只同步db1数据库的数据,内容如下
[root@canal1 ~]#
[root@canal1 ~]# cat example_db1/instance.properties
#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal2为1232,canal3为1233
canal.instance.mysql.slaveId=1231
# enable gtid use true/false
canal.instance.gtidon=false
# position info
canal.instance.master.address=canal2:3306
canal.instance.master.journal.name=mysql-bin.000026
canal.instance.master.position=156
canal.instance.master.timestamp=
canal.instance.master.gtid=
# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=
# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal
canal.instance.standby.address = canal1:3306
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=
# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=Canal_123
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
# table regex
canal.instance.filter.regex=db1\\.tb\\d_\\d
# table black regex
canal.instance.filter.black.regex=mysql\\.slave_.*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch
# mq config
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#################################################
# 需要开启心跳检查
canal.instance.detecting.enable = true
# 心跳检查sql
canal.instance.detecting.sql = select 1
# 心跳检查频率
canal.instance.detecting.interval.time = 3
# 心跳检查失败次数阀值,超过该阀值后会触发mysql链接切换,比如切换到standby机器上继续消费binlog
canal.instance.detecting.retry.threshold = 3
# 心跳检查超过失败次数阀值后,是否开启master/standby的切换
canal.instance.detecting.heartbeatHaEnable = true
[root@canal1 ~]#
删除3台服务器的conf/example目录,并将example_db1放到3台服务器的conf目录下,并修改canal2和canal3的instance.properties的canal.instance.mysql.slaveId
参数
3. 同步代码的开发
3.1 重复测试的脚本
因为测试的时候,有时候会删除client在zookeeper的消费position, 和让canal server重新从instance.properties设置的position开始消费,所以需要重启canal server
[root@canal1 ~]#
[root@canal1 ~]# pwd
/root
[root@canal1 ~]#
[root@canal1 ~]# cat example_db_repeat_test.sh
#!/usr/bin/env bash
# 脚本参数
zk_cli=/root/apache-zookeeper-3.6.3-bin/bin/zkCli.sh
zk_url=canal1:2181,canal2:2181,canal3:2181
destinations=example_db1
canal_servers=('canal1' 'canal2' 'canal3')
canal_user=root
canal_restart_sh=/root/canal.deployer-1.1.5/bin/restart.sh
# 删除zk的canal client position
${zk_cli} -server ${zk_url} delete /otter/canal/destinations/${destinations}/1001/cursor
# 重启3台服务的canal server
for canal_server in ${canal_servers[@]}
do
ssh ${canal_user}@${canal_server} << begin2end
source /root/.bashrc
sh ${canal_restart_sh}
exit
begin2end
done
[root@canal1 ~]#
3.2 scala同步代码
项目结构如下:
各文件作用如下:
- db1_tables.txt:数据库需要同步的表名文件
- etl.properties:批量数据同步的配置文件
- launch.properties:增量日志同步的配置文件
- logback.xml:项目运行的日志输出配置文件
- MysqlConnectPool:Mysql连接池文件
- Postgres_etl:批量数据同步程序
- Mysql_postgres_datatype:Mysql和Postgres列类型对应
- PostgresConnectPool:Postgres连接池文件
- Sync_launch:增量数据同步程序
3.2.1 db1_tables.txt
tb1_1
tb1_2
3.2.2 etl.properties
source_db=db1
target_db=db1
target_db_schema=db1.public
mysql_host=192.168.23.33
mysql_port=3306
mysql_user=root
mysql_password=Root_123
postgres_host=192.168.23.33
postgres_port=5432
postgres_user=postgres
postgres_password=postgres123
tables_filename=db1_tables.txt
max_threads=8
query_batch_size=100000
3.2.3 launch.properties
destination=example_db1
subscribe=db1\\.tb\\d_\\d
source_db=db1
target_db=db1
target_db_schema=db1.public
zk_url=192.168.23.31:2181,192.168.23.32:2181,192.168.23.33:2181
postgres_host=192.168.23.33
postgres_port=5432
postgres_user=postgres
postgres_password=postgres123
tables_filename=db1_tables.txt
3.2.4 logback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<!--格式化输出:%d表示日期,%thread表示线程名,%-5level:级别靠左显示5个字符宽度,%logger:日志的类名,%msg:日志消息,%n是换行符 -->
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} --- %msg%n</pattern>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="STDOUT" />
</root>
</configuration>
3.2.5 MysqlConnectPool
package com.mq.canal_postgres_client.batch
import java.io.FileInputStream
import java.util.Properties
class MysqlConnectPool(properties_filename: String) {
val prop = new Properties()
val input_stream = new FileInputStream(properties_filename)
prop.load(input_stream)
//数据源配置
val dataSource = new DruidDataSource()
dataSource.setUrl(s"jdbc:mysql://${prop.getProperty("mysql_host")}:${prop.getProperty("mysql_port")}/${prop.getProperty("source_db")}?serverTimezone=UTC")
dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver")
dataSource.setUsername(prop.getProperty("mysql_user"))
dataSource.setPassword(prop.getProperty("mysql_password"))
//下面都是可选的配置
dataSource.setInitialSize(10) //初始连接数
dataSource.setMaxActive(30) //最大连接数
dataSource.setMinIdle(10) //最小闲置数
dataSource.setMaxWait(5000) //获取连接的最大等待时间,单位毫秒
dataSource.setPoolPreparedStatements(false) //缓存PreparedStatement
dataSource.setMaxOpenPreparedStatements(-1) //缓存PreparedStatement的最大数量,默认-1(不缓存)
dataSource.setValidationQuery("select 1")
dataSource.setTestWhileIdle(true)
dataSource.setTestOnBorrow(false)
dataSource.setTestOnReturn(false)
def getConnection() = {
dataSource.getConnection()
}
}
3.2.6 Postgres_etl.scala
package com.mq.canal_postgres_client.batch
import com.mq.canal_postgres_client.Mysql_postgres_datatype_map.mysql_postgres_datatype_map
import com.mq.canal_postgres_client.PostgresConnectPool
import org.apache.commons.io.{Charsets, IOUtils}
import java.io.FileInputStream
import java.sql.{Connection, SQLException, Statement}
import java.util.Properties
import java.util.concurrent.{ExecutorService, Executors}
import scala.collection.mutable.ArrayBuffer
import scala.jdk.CollectionConverters.CollectionHasAsScala
/*
${JAVA_HOME8}/bin/java \
-classpath /root/canal_postgres_client-1.0-SNAPSHOT.jar \
com.mq.canal_postgres_client.batch.Postgres_etl
*/
class Query_insert_data(mysql_connect_pool: MysqlConnectPool,
table_name: String,
prop: Properties,
postgres_connect_pool: PostgresConnectPool,
query_offset: Long,
query_batch_size: Long) extends Runnable {
override def run(): Unit = {
val mysql_conn = mysql_connect_pool.getConnection()
val mysql_stmt = mysql_conn.createStatement()
val postgres_conn = postgres_connect_pool.getConnection()
val mysql_query_sql = s"select * from ${table_name} limit ${query_offset}, ${query_batch_size}"
println(s">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>${mysql_query_sql}>>>>>>>>>>>>>>>>>>>>>>>>>")
val rs = mysql_stmt.executeQuery(mysql_query_sql)
// 获取所有列名和类型名, 并拼接prepareStatement的sql
val rmd = rs.getMetaData
val column_name_types = ArrayBuffer[(String, String)]()
var columns_str = ""
var values_str = ""
for (number <- 1 to rmd.getColumnCount) {
val column_name = rmd.getColumnName(number)
columns_str = columns_str + s"${column_name},"
values_str = values_str + "?,"
// ArrayBuffer((colA,int), (colB,varchar), (colC,decimal))
column_name_types += ((column_name, rmd.getColumnTypeName(number).toLowerCase))
}
columns_str = columns_str.dropRight(1)
values_str = values_str.dropRight(1)
val insert_sql = s"insert into ${prop.getProperty("target_db_schema")}.${table_name}(${columns_str}) values(${values_str})"
val postgres_pst = postgres_conn.prepareStatement(insert_sql)
postgres_conn.setAutoCommit(false)
// 计数器
var increment_batch_count = 0L
// 设置prestatement语句里面的值
while (rs.next()) {
increment_batch_count += 1
for (column_name_type_index <- column_name_types.zipWithIndex) {
val column_name = column_name_type_index._1._1
val column_type = column_name_type_index._1._2
val jdbc_parameter_index = column_name_type_index._2 + 1
if (column_type == "bit" || column_type == "tinyint" || column_type.startsWith("int") || column_type.startsWith("bigint")) {
postgres_pst.setLong(jdbc_parameter_index, rs.getLong(column_name))
} else if (column_type == "float" || column_type == "double" || column_type == "decimal") {
postgres_pst.setBigDecimal(jdbc_parameter_index, rs.getBigDecimal(column_name))
} else if (column_type == "char" || column_type == "varchar" || column_type == "mediumtext" || column_type == "text") {
postgres_pst.setString(jdbc_parameter_index, rs.getString(column_name))
} else if (column_type == "date") {
try {
postgres_pst.setDate(jdbc_parameter_index, rs.getDate(column_name))
} catch {
case e: SQLException => {
postgres_pst.setDate(jdbc_parameter_index, null)
}
}
} else if (column_type == "time") {
postgres_pst.setTime(jdbc_parameter_index, rs.getTime(column_name))
} else if (column_type == "datetime" || column_type == "timestamp") {
try {
postgres_pst.setTimestamp(jdbc_parameter_index, rs.getTimestamp(column_name))
} catch {
case e: SQLException => {
postgres_pst.setTimestamp(jdbc_parameter_index, null)
}
}
} else if (column_type == "longblob") {
val longblob_value = rs.getBlob(column_name)
if (longblob_value == null) {
postgres_pst.setBinaryStream(jdbc_parameter_index, null)
} else {
postgres_pst.setBinaryStream(jdbc_parameter_index, longblob_value.getBinaryStream)
}
}
}
postgres_pst.addBatch()
// 按批次插入到postgres
if (increment_batch_count % 5000 == 0) {
postgres_pst.executeBatch()
postgres_pst.clearBatch()
postgres_conn.commit()
}
}
postgres_pst.executeBatch()
postgres_pst.clearBatch()
postgres_conn.commit()
postgres_conn.setAutoCommit(true)
mysql_stmt.close()
mysql_conn.close()
postgres_pst.close()
postgres_conn.close()
}
}
object Postgres_etl {
def load_etl_properties() = {
val prop = new Properties()
val input_stream = new FileInputStream(s"${System.getProperty("user.dir")}/conf/etl.properties")
prop.load(input_stream)
prop
}
// 获取数据库所有表名
def get_mysql_table_names(mysql_stmt: Statement, prop: Properties) = {
val query_sql = s"select table_name from information_schema.tables where table_schema = '${prop.getProperty("source_db")}'"
val rs = mysql_stmt.executeQuery(query_sql)
val table_names = ArrayBuffer[String]()
while (rs.next()) {
table_names += rs.getString(1)
}
table_names
}
// 先删除表再创建表
def create_target_table(mysql_stmt: Statement, postgres_stmt: Statement, table_name: String, prop: Properties) = {
val mysql_query_sql = s"select column_name, data_type, column_key from information_schema.columns where table_schema = '${prop.getProperty("source_db")}' and table_name = '${table_name}' order by ordinal_position asc"
val rs = mysql_stmt.executeQuery(mysql_query_sql)
// 目标表创建sql拼接
var target_create_table_sql = s"create table if not exists ${prop.getProperty("target_db_schema")}.${table_name}("
val column_type_sqls: ArrayBuffer[String] = ArrayBuffer()
while (rs.next()) {
var data_type = rs.getString(2)
// 如果类型不一样,则进行类型映射替换
if (mysql_postgres_datatype_map.contains(data_type)) {
data_type = mysql_postgres_datatype_map(data_type)
}
// 如果是主键,就添加到前面
if (rs.getString(3) == "PRI") {
column_type_sqls.insert(0, s"${rs.getString(1)} ${data_type}")
} else {
column_type_sqls += s"${rs.getString(1)} ${data_type}"
}
}
target_create_table_sql = target_create_table_sql + column_type_sqls.mkString(",") + ")"
postgres_stmt.executeUpdate(s"drop table if exists ${prop.getProperty("target_db_schema")}.${table_name}")
postgres_stmt.executeUpdate(target_create_table_sql)
}
def query_insert_data(mysql_stmt: Statement,
table_name: String,
prop: Properties,
postgres_conn: Connection,
executorService: ExecutorService,
mysql_connect_pool: MysqlConnectPool,
postgres_connect_pool: PostgresConnectPool) = {
val count_query_sql = s"select count(*) from ${table_name}"
val count_rs = mysql_stmt.executeQuery(count_query_sql)
val table_count = if (count_rs.next()) count_rs.getLong(1) else 0L
val query_batch_size = prop.getProperty("query_batch_size").toLong
val batch_num = math.ceil(table_count.toDouble / query_batch_size).toInt
for (batch_no <- 1 to batch_num) {
val query_offset = (batch_no - 1) * query_batch_size
executorService.submit(new Query_insert_data(mysql_connect_pool, table_name, prop, postgres_connect_pool, query_offset, query_batch_size))
}
}
def get_required_tables(prop: Properties) = {
val input_stream = new FileInputStream(s"${System.getProperty("user.dir")}/conf/${prop.getProperty("tables_filename")}")
val require_tables = IOUtils.readLines(input_stream, Charsets.toCharset("UTF-8"))
.asScala.toSeq
require_tables
}
def main(args: Array[String]): Unit = {
val mysql_connect_pool = new MysqlConnectPool(s"${System.getProperty("user.dir")}/conf/etl.properties")
val postgres_connect_pool = new PostgresConnectPool(s"${System.getProperty("user.dir")}/conf/etl.properties")
var mysql_conn: Connection = null
var mysql_stmt: Statement = null
var postgres_conn: Connection = null
var postgres_stmt: Statement = null
var executorService: ExecutorService = null
try {
val prop = load_etl_properties()
Class.forName("com.mysql.cj.jdbc.Driver")
mysql_conn = mysql_connect_pool.getConnection()
mysql_stmt = mysql_conn.createStatement()
postgres_conn = postgres_connect_pool.getConnection()
postgres_stmt = postgres_conn.createStatement()
val required_tables = get_required_tables(prop)
val table_names = get_mysql_table_names(mysql_stmt, prop)
executorService = Executors.newFixedThreadPool(prop.getProperty("max_threads").toInt)
// 循环处理每个表
for (table_name <- table_names) {
if (required_tables.exists(_ == table_name)) {
create_target_table(mysql_stmt, postgres_stmt, table_name, prop)
Postgres_etl.query_insert_data(mysql_stmt, table_name, prop,
postgres_conn, executorService,
mysql_connect_pool, postgres_connect_pool)
}
}
} catch {
case e: Exception => {
e.printStackTrace()
throw e
}
} finally {
mysql_stmt.close()
mysql_conn.close()
postgres_conn.close()
postgres_stmt.close()
executorService.shutdown()
}
}
}
3.2.7 Mysql_postgres_datatype
package com.mq.canal_postgres_client
object Mysql_postgres_datatype_map {
val mysql_postgres_datatype_map = Map(
"datetime" -> "timestamp",
"longblob" -> "bytea",
"double" -> "double precision",
"char" -> "varchar",
"tinyint" -> "smallint",
"mediumtext" -> "text"
)
}
3.2.8 PostgresConnectPool
package com.mq.canal_postgres_client
import java.io.FileInputStream
import java.util.Properties
class PostgresConnectPool(properties_filename: String) {
val prop = new Properties()
val input_stream = new FileInputStream(properties_filename)
prop.load(input_stream)
//数据源配置
val dataSource = new DruidDataSource()
dataSource.setUrl(s"jdbc:postgresql://${prop.getProperty("postgres_host")}:${prop.getProperty("postgres_port")}/${prop.getProperty("target_db")}?serverTimezone=UTC")
// dataSource.setDriverClassName("") // 根据url自动识别driver
dataSource.setUsername(prop.getProperty("postgres_user"))
dataSource.setPassword(prop.getProperty("postgres_password"))
//下面都是可选的配置
dataSource.setInitialSize(10) //初始连接数
dataSource.setMaxActive(30) //最大连接数
dataSource.setMinIdle(10) //最小闲置数
dataSource.setMaxWait(5000) //获取连接的最大等待时间,单位毫秒
dataSource.setPoolPreparedStatements(false) //缓存PreparedStatement
dataSource.setMaxOpenPreparedStatements(-1) //缓存PreparedStatement的最大数量,默认-1(不缓存)
dataSource.setValidationQuery("select 1")
dataSource.setTestWhileIdle(true)
dataSource.setTestOnBorrow(false)
dataSource.setTestOnReturn(false)
def getConnection() = {
dataSource.getConnection()
}
}
3.2.9 Sync_launch.scala
package com.mq.canal_postgres_client
import com.alibaba.otter.canal.client.{CanalConnector, CanalConnectors}
import com.alibaba.otter.canal.protocol.CanalEntry
import com.mq.canal_postgres_client.Mysql_postgres_datatype_map.mysql_postgres_datatype_map
import org.apache.commons.io.{Charsets, IOUtils}
import org.apache.commons.lang3.time.FastDateFormat
import java.io.FileInputStream
import java.sql.{Connection, PreparedStatement, Statement}
import java.util.Properties
import scala.collection.mutable.ArrayBuffer
import scala.jdk.CollectionConverters._
import scala.util.control.Breaks._
/* 运行jar包命令:
nohup ${JAVA_HOME8}/bin/java \
-classpath /root/canal_postgres_client-1.0-SNAPSHOT.jar \
com.mq.canal_postgres_client.Sync_launch >> /root/sync_launch.log 2>&1 &
**/
object Sync_launch {
def load_launch_properties() = {
val prop = new Properties()
val input_stream = new FileInputStream(s"${System.getProperty("user.dir")}/conf/launch.properties")
prop.load(input_stream)
prop
}
def get_entries(prop: Properties, postgresConnectPool: PostgresConnectPool) = {
val connector = CanalConnectors.newClusterConnector(prop.getProperty("zk_url"), prop.getProperty("destination"), "", "")
connector.connect()
connector.subscribe(prop.getProperty("subscribe"))
connector.rollback() // 回滚上次未提交完成的batch
while (true) {
val message = connector.getWithoutAck(1000)
val batch_id = message.getId()
val batch_size = message.getEntries().size()
if (!(batch_id == -1 || batch_size == 0)) {
var postgres_conn: Connection = null
var postgres_stmt: Statement = null
try {
val entries = message.getEntries()
postgres_conn = postgresConnectPool.getConnection()
postgres_stmt = postgres_conn.createStatement()
deal_entrys(entries, postgres_stmt, prop, postgres_conn) // st是地址传递
connector.ack(batch_id)
} catch {
case e: Exception => {
connector.rollback(batch_id)
e.printStackTrace()
throw e
}
} finally {
postgres_stmt.close()
postgres_conn.close()
}
}
Thread.sleep(1000L) // 等待1秒
}
connector
}
def print_column(columns: java.util.List[CanalEntry.Column]) {
// 循环处理列
for (column <- columns.asScala) {
// column.getUpdated(): EventType=UPDATE时, 用来标识这个字段值是否有修改
println(s"column[${column.getName()}:${column.getMysqlType}:${column.getValue()}], is_key: ${column.getIsKey}, is_null: ${column.getIsNull}, update: ${column.getUpdated()}")
}
}
// 设置preparestatement的value值
def set_jdbc_preparestatement_value(postgres_pst: PreparedStatement, column_value_types: ArrayBuffer[(String, String)], add_parameter_index: Int) = {
val date_fdf = FastDateFormat.getInstance("yyyy-MM-dd")
val time_fdf = FastDateFormat.getInstance("HH:mm:ss")
val datetime_fdf = FastDateFormat.getInstance("yyyy-MM-dd HH:mm:ss")
// 设置value的值
for (column_value_type_index <- column_value_types.zipWithIndex) {
val column_value = column_value_type_index._1._1
val column_type = column_value_type_index._1._2
val jdbc_parameter_index = column_value_type_index._2 + 1 + add_parameter_index
if (column_type.startsWith("tinyint") || column_type.startsWith("int") || column_type.startsWith("bigint")) {
if (column_value == "") {
postgres_pst.setLong(jdbc_parameter_index, 0L)
} else {
postgres_pst.setLong(jdbc_parameter_index, column_value.toLong)
}
} else if (column_type.startsWith("float") || column_type.startsWith("double") || column_type.startsWith("decimal")) {
if (column_value == "") {
postgres_pst.setBigDecimal(jdbc_parameter_index, null)
} else {
postgres_pst.setBigDecimal(jdbc_parameter_index, new java.math.BigDecimal(column_value))
}
} else if (column_type.startsWith("char") || column_type.startsWith("varchar") || column_type == "text") {
postgres_pst.setString(jdbc_parameter_index, column_value)
} else if (column_type == "date") {
if (column_value == "") {
postgres_pst.setDate(jdbc_parameter_index, null)
} else {
postgres_pst.setDate(jdbc_parameter_index, new java.sql.Date(date_fdf.parse(column_value).getTime))
}
} else if (column_type == "time") {
if (column_value == null || column_value == "") {
postgres_pst.setTime(jdbc_parameter_index, null)
} else {
postgres_pst.setTime(jdbc_parameter_index, new java.sql.Time(time_fdf.parse(column_value).getTime))
}
} else if (column_type == "datetime" || column_type == "timestamp") {
if (column_value == "") {
postgres_pst.setTimestamp(jdbc_parameter_index, null)
} else {
postgres_pst.setTimestamp(jdbc_parameter_index, new java.sql.Timestamp(datetime_fdf.parse(column_value).getTime))
}
} else if (column_type == "longblob") {
// 参考源码:com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert 839行
postgres_pst.setBytes(jdbc_parameter_index, column_value.getBytes("ISO-8859-1"))
}
}
}
def alter_operate(row_change: CanalEntry.RowChange,
postgres_stmt: Statement,
schema_name: String,
prop: Properties
) = {
if (row_change.getIsDdl()) {
val ddl_sql = row_change.getSql() // row_change.getDdlSchemaName(): 获取源端在哪个数据库运行DDL命令
println(ddl_sql)
var target_ddl_sql = ""
// MODIFY COLUMN和ADD COLUMN情况处理
if (ddl_sql.contains("MODIFY COLUMN") || ddl_sql.contains("ADD COLUMN")) {
// 样列数据:ALTER TABLE `db`.`tb` MODIFY COLUMN `col2` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '列2' AFTER `col1`
// 样列数据:ALTER TABLE `db`.`tb` ADD COLUMN `col3` int(5) NULL COMMENT '列3' AFTER `col2`
val elements = ddl_sql.split(" ").take(7)
val alter_table__element = elements(0)
val table_element = elements(1)
val db_tb_element = prop.getProperty("target_db_schema") + "." + elements(2).replaceAll("`", "").split("\\.")(1)
val alter_column_element = if (ddl_sql.contains("MODIFY COLUMN")) "alter" else elements(3)
val column_element = elements(4)
val col_element = elements(5).replaceAll("`", "")
val type_element = if (ddl_sql.contains("MODIFY COLUMN")) "type " else ""
val target_type_element = {
var target_type = elements(6).split("\\(")(0)
if (mysql_postgres_datatype_map.contains(target_type)) target_type = mysql_postgres_datatype_map(target_type)
target_type
}
target_ddl_sql = s"${alter_table__element} ${table_element} ${db_tb_element} ${alter_column_element} ${column_element} ${col_element} ${type_element}${target_type_element}"
println(target_ddl_sql)
// DROP COLUMN情况处理
} else if (ddl_sql.contains("DROP COLUMN")) {
// 样列数据: ALTER TABLE `db`.`tb` DROP COLUMN `col3`
val elements = ddl_sql.split(" ")
val alter_table__element = elements(0)
val table_element = elements(1)
val db_tb_element = prop.getProperty("target_db_schema") + "." + elements(2).replaceAll("`", "").split("\\.")(1)
val alter_column_element = elements(3)
val column_element = elements(4)
val col_element = elements(5).replaceAll("`", "")
target_ddl_sql = s"${alter_table__element} ${table_element} ${db_tb_element} ${alter_column_element} ${column_element} ${col_element}"
println(target_ddl_sql)
// 其它情况先不考虑,等实际情况再处理
} else {
target_ddl_sql = ddl_sql
}
postgres_stmt.execute(target_ddl_sql)
}
}
def create_operate(row_change: CanalEntry.RowChange,
postgres_stmt: Statement
) = {
val ddl_sql = row_change.getSql()
println(ddl_sql) // row_change.getDdlSchemaName(): 获取源端在哪个数据库运行DDL命令
postgres_stmt.execute(ddl_sql)
}
def rename_operate(row_change: CanalEntry.RowChange,
postgres_stmt: Statement
) = {
val ddl_sql = row_change.getSql()
println(ddl_sql) // row_change.getDdlSchemaName(): 获取源端在哪个数据库运行DDL命令
postgres_stmt.execute(ddl_sql)
}
def truncate_operate(row_change: CanalEntry.RowChange,
postgres_stmt: Statement
) = {
val ddl_sql = row_change.getSql()
println(ddl_sql) // row_change.getDdlSchemaName(): 获取源端在哪个数据库运行DDL命令
postgres_stmt.execute(ddl_sql)
}
def erase_operate(row_change: CanalEntry.RowChange,
postgres_stmt: Statement
) = {
val ddl_sql = row_change.getSql()
println(ddl_sql) // row_change.getDdlSchemaName(): 获取源端在哪个数据库运行DDL命令
postgres_stmt.execute(ddl_sql)
}
// 插入数据同步
def insert_data(columns: java.util.List[CanalEntry.Column], postgres_conn: Connection, table_name: String, prop: Properties) = {
var columns_str = ""
var values_str = ""
val column_value_types = ArrayBuffer[(String, String)]()
// 拼接columns和values字符串,并得到value和数据类型的数组
for (column <- columns.asScala) {
columns_str = columns_str + s"${column.getName},"
values_str = values_str + "?,"
column_value_types += ((column.getValue, column.getMysqlType))
}
columns_str = columns_str.dropRight(1)
values_str = values_str.dropRight(1)
val insert_sql = s"insert into ${prop.getProperty("target_db_schema")}.${table_name}(${columns_str}) values(${values_str})"
val postgres_pst = postgres_conn.prepareStatement(insert_sql)
set_jdbc_preparestatement_value(postgres_pst, column_value_types, 0)
postgres_pst.executeUpdate()
postgres_pst.close()
}
// 删除数据同步
def delete_data(columns: java.util.List[CanalEntry.Column], postgres_conn: Connection, table_name: String, prop: Properties) = {
var column_values_str = ""
val column_value_types = ArrayBuffer[(String, String)]()
// 拼接where条件字符串,并得到value和数据类型的数组
for (column <- columns.asScala) {
// 只根据主键进行删除
if (column.getIsKey) {
column_values_str = column_values_str + s"${column.getName} = ? and "
column_value_types += ((column.getValue, column.getMysqlType))
}
}
column_values_str = column_values_str.dropRight(5)
val delete_sql = s"delete from ${prop.getProperty("target_db_schema")}.${table_name} where ${column_values_str}"
val postgres_pst = postgres_conn.prepareStatement(delete_sql)
set_jdbc_preparestatement_value(postgres_pst, column_value_types, 0)
postgres_pst.executeUpdate()
postgres_pst.close()
}
// 更新数据同步
def update_data(columns: java.util.List[CanalEntry.Column], postgres_conn: Connection, table_name: String, prop: Properties) = {
var set_columns_str = ""
val set_value_types = ArrayBuffer[(String, String)]()
var condition_columns_str = ""
val condition_value_types = ArrayBuffer[(String, String)]()
// 拼接set和where字符串,并获得set值数组和where值数组
for (column <- columns.asScala) {
if (column.getUpdated()) {
set_columns_str = set_columns_str + s"${column.getName} = ?,"
set_value_types += ((column.getValue, column.getMysqlType))
} else {
condition_columns_str = condition_columns_str + s"${column.getName()} = ? and "
condition_value_types += ((column.getValue, column.getMysqlType))
}
}
val set_value_type_size = set_value_types.size
set_columns_str = set_columns_str.dropRight(1)
condition_columns_str = condition_columns_str.dropRight(5)
val update_sql = s"update ${prop.getProperty("target_db_schema")}.${table_name} set ${set_columns_str} where ${condition_columns_str}"
val postgres_pst = postgres_conn.prepareStatement(update_sql)
set_jdbc_preparestatement_value(postgres_pst, set_value_types, 0)
set_jdbc_preparestatement_value(postgres_pst, condition_value_types, set_value_type_size)
postgres_pst.executeUpdate()
postgres_pst.close()
}
def get_required_tables(prop: Properties) = {
val input_stream = new FileInputStream(s"${System.getProperty("user.dir")}/conf/${prop.getProperty("tables_filename")}")
val require_tables = IOUtils.readLines(input_stream, Charsets.toCharset("UTF-8"))
.asScala.toSeq
require_tables
}
def deal_entrys(entries: java.util.List[CanalEntry.Entry], postgres_stmt: Statement, prop: Properties, postgres_conn: Connection) = {
val fdf = FastDateFormat.getInstance("yyyy-MM-dd HH:mm:ss")
val required_tables = get_required_tables(prop)
// 循环处理事务
for (entry <- entries.asScala) {
val table_name = entry.getHeader().getTableName()
// breakable位于for循环里面,break代表continue
breakable {
// 不是txt文件中的表,不处理
if (!(required_tables.exists(_ == table_name)) || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
break
}
val schema_name = entry.getHeader().getSchemaName()
val event_type = entry.getHeader().getEventType
if (schema_name != "" || table_name != "") { // 不打印没有db_tb的记录
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>一条SQL语句处理>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
println(s"canal_server_id: ${entry.getHeader().getServerId()}, binlog_file[${entry.getHeader().getLogfileName()}:${entry.getHeader().getLogfileOffset()}], db_tb[${schema_name}.${table_name}], event_type: ${event_type}, execute_time: ${fdf.format(entry.getHeader().getExecuteTime)}")
}
val row_change = CanalEntry.RowChange.parseFrom(entry.getStoreValue())
try {
// 表结构更改的同步
if (event_type == CanalEntry.EventType.ALTER) {
alter_operate(row_change, postgres_stmt, schema_name, prop)
} else if (event_type == CanalEntry.EventType.CREATE) {
create_operate(row_change, postgres_stmt)
} else if (event_type == CanalEntry.EventType.RENAME) {
rename_operate(row_change, postgres_stmt)
} else if (event_type == CanalEntry.EventType.TRUNCATE) {
truncate_operate(row_change, postgres_stmt)
} else if (event_type == CanalEntry.EventType.ERASE) {
erase_operate(row_change, postgres_stmt)
}
// 循环处理每条数据;比如一个insert命令,可以插入多条数据
for (row_data <- row_change.getRowDatasList().asScala) {
println(">>>>>>>>>>>>>>>>>>>>>>>>>一个更新处理>>>>>>>>>>>>>>>>>>>")
if (event_type == CanalEntry.EventType.INSERT) {
print_column(row_data.getAfterColumnsList())
insert_data(row_data.getAfterColumnsList(), postgres_conn, table_name, prop)
} else if (event_type == CanalEntry.EventType.DELETE) {
print_column(row_data.getBeforeColumnsList())
delete_data(row_data.getBeforeColumnsList(), postgres_conn, table_name, prop)
} else if (event_type == CanalEntry.EventType.UPDATE) {
println("===========update before===========")
print_column(row_data.getBeforeColumnsList())
println("===========update after===========")
print_column(row_data.getAfterColumnsList())
update_data(row_data.getAfterColumnsList(), postgres_conn, table_name, prop)
}
}
} catch {
case e: Exception => {
// SQL解析和数据插入的异常,直接忽略
e.printStackTrace()
}
}
}
}
}
def main(args: Array[String]): Unit = {
var connector: CanalConnector = null
try {
val prop = load_launch_properties()
val postgresConnectPool = new PostgresConnectPool(s"${System.getProperty("user.dir")}/conf/launch.properties")
connector = get_entries(prop, postgresConnectPool)
} catch {
case e: Exception => {
e.printStackTrace()
throw e
}
} finally {
connector.disconnect()
}
}
}
3.3 代码启动顺序
- 执行Postgres_etl.scala
- 运行example_db_repeat_test.sh
- 执行Sync_launch.scala
输出如下
......省略部分......
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>一条SQL语句处理>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
canal_server_id: 2, binlog_file[mysql-bin.000026:370], db_tb[db1.tb1_1], event_type: INSERT, execute_time: 2021-07-28 15:50:28
>>>>>>>>>>>>>>>>>>>>>>>>>一个更新处理>>>>>>>>>>>>>>>>>>>
column[idA:int:4], is_key: false, is_null: false, update: true
column[nameA:varchar(32):4], is_key: false, is_null: false, update: true
column[scoreA:decimal(5,2):4.40], is_key: false, is_null: false, update: true
insert into db1.public.tb1_1 (idA, nameA, scoreA) values(4, '4', 4.40)
>>>>>>>>>>>>>>>>>>>>>>>>>一个更新处理>>>>>>>>>>>>>>>>>>>
column[idA:int:5], is_key: false, is_null: false, update: true
column[nameA:varchar(32):5], is_key: false, is_null: false, update: true
column[scoreA:decimal(5,2):5.50], is_key: false, is_null: false, update: true
insert into db1.public.tb1_1 (idA, nameA, scoreA) values(5, '5', 5.50)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>一条SQL语句处理>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
canal_server_id: 2, binlog_file[mysql-bin.000026:668], db_tb[db1.tb1_2], event_type: DELETE, execute_time: 2021-07-28 15:51:14
>>>>>>>>>>>>>>>>>>>>>>>>>一个更新处理>>>>>>>>>>>>>>>>>>>
column[idA:int:2], is_key: false, is_null: false, update: false
column[nameA:varchar(32):2], is_key: false, is_null: false, update: false
column[ageA:int:2], is_key: false, is_null: false, update: false
delete from db1.public.tb1_2 where idA = 2 and nameA = '2' and ageA = 2
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>一条SQL语句处理>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
canal_server_id: 2, binlog_file[mysql-bin.000026:957], db_tb[db1.tb1_2], event_type: DELETE, execute_time: 2021-07-28 15:51:19
>>>>>>>>>>>>>>>>>>>>>>>>>一个更新处理>>>>>>>>>>>>>>>>>>>
column[idA:int:3], is_key: false, is_null: false, update: false
column[nameA:varchar(32):3], is_key: false, is_null: false, update: false
column[ageA:int:3], is_key: false, is_null: false, update: false
delete from db1.public.tb1_2 where idA = 3 and nameA = '3' and ageA = 3
......省略部分......
3.4 postgresql数据库结果
postgres=#
postgres=# \c db1
You are now connected to database "db1" as user "postgres".
db1=#
db1=# select * from tb1_1;
ida | namea | scorea
-----+-------+--------
1 | 1 | 1.10
2 | 2 | 2.20
3 | 3 | 3.30
4 | 4 | 4.40
5 | 5 | 5.50
(5 rows)
db1=#
db1=# select * from tb1_2;
ida | namea | agea
-----+-------+------
1 | 1 | 1
(1 row)
db1=#