TIDB-BINLOG初步搭建

与数据交流的路上

已于 2023-05-06 15:03:11 修改

阅读量1.6k

点赞数

分类专栏： TIDB 文章标签：数据库

于 2021-03-25 16:15:38 首次发布

本文链接：https://blog.csdn.net/line_on_database/article/details/115179581

版权

TIDB 专栏收录该内容

63 篇文章 2 订阅

订阅专栏

一、简介

1.TiDB Binlog 是一个用于收集 TiDB 的 binlog，并提供准实时备份和同步功能的商业工具

1.1 应用场景

数据同步：同步 TiDB 集群数据到其他数据库

实时备份和恢复：：备份 TiDB 集群数据，同时可以用于 TiDB 集群故障时恢复

2. pump

pump用于实时记录 TiDB 产生的 Binlog，并将 Binlog 按照事务的提交时间进行排序，再提供给 Drainer 进行消费。

3.drainer

drainer从各个 Pump 中收集 Binlog 进行归并，再将 Binlog 转化成 SQL 或者指定格式的数据，最终同步到下游。

4.binlogctl 工具

binlogctl 是一个 TiDB Binlog 配套的运维工具，具有如下功能：

获取 TiDB 集群当前的 TSO

查看 Pump/Drainer 状态

修改 Pump/Drainer 状态

暂停/下线 Pump/Drainer

主要特性

多个 Pump 形成一个集群，可以水平扩容。

TiDB 通过内置的 Pump Client 将 Binlog 分发到各个 Pump。

Pump 负责存储 Binlog，并将 Binlog 按顺序提供给 Drainer。

Drainer 负责读取各个 Pump 的 Binlog，归并排序后发送到下游。

Drainer 支持 relay log 功能，通过 relay log 保证下游集群的一致性状态。

注意事项

需要使用 TiDB v2.0.8-binlog、v2.1.0-rc.5 及以上版本，否则不兼容该版本的 TiDB Binlog。

Drainer 支持将 Binlog 同步到 MySQL、TiDB、Kafka 或者本地文件。如果需要将 Binlog 同步到其他 Drainer 不支持的类型的系统中，可以设置 Drainer 将 Binlog 同步到 Kafka，然后根据 binlog consumer protocol 进行定制处理，参考 Binlog Consumer Client 用户文档。

如果 TiDB Binlog 用于增量恢复，可以设置配置项 db-type="file"，Drainer 会将 binlog 转化为指定的 proto buffer 格式的数据，再写入到本地文件中。这样就可以使用 Reparo 恢复增量数据。

关于 db-type 的取值，应注意：

如果 TiDB 版本 < 2.1.9，则 db-type="pb"。

如果 TiDB 版本 > = 2.1.9，则 db-type="file" 或 db-type="pb"。

如果下游为 MySQL/TiDB，数据同步后可以使用 sync-diff-inspector 进行数据校验。

二、扩容tidb-binlog

1.配置如下配置配置文件

vim scale-out.yaml

pump_servers:
  - host: 10.0.1.1
    ssh_port: 22
    port: 8250
    deploy_dir: "/data/tidb/tidb-deploy/pump-8250/deploy"
    data_dir: "/data/tidb/tidb-data/pump-8250/data"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      gc: 7
  - host: 10.0.1.2
    ssh_port: 22
    port: 8250
    deploy_dir: "/data/tidb/tidb-deploy/pump-8250/deploy"
    data_dir: "/data/tidb/tidb-data/pump-8250/data"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      gc: 7
  - host: 10.0.1.3
    ssh_port: 22
    port: 8250
    deploy_dir: "/data/tidb/tidb-deploy/pump-8250/deploy"
    data_dir: "/data/tidb/tidb-data/pump-8250/data"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      gc: 7
drainer_servers:
  - host: 10.0.1.12
    port: 8249
    data_dir: "/data/tidb/tidb-data/drainer-8249/data"
    # If drainer doesn't have a checkpoint, use initial commitTS as the initial checkpoint.
    # Will get a latest timestamp from pd if commit_ts is set to -1 (the default value).
    commit_ts: -1
    deploy_dir: "/data/tidb/tidb-deploy/drainer-8249/deploy"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      syncer.db-type: "tidb"
      syncer.to.host: "10.0.1.12"
      syncer.to.user: "root"
      syncer.to.password: ""
      syncer.to.port: 4000

2.扩容

 tiup cluster scale-out cluster-name scale-out.yaml

三、初步使用

1.编辑配置文件

 tiup cluster edit-config cluster_name

2.加入下图配置

3.将配置文件重新载入集群

tiup cluster reload cluster_name -R tidb

4.查看集群状态（登录tidb数据库）

show drainer status;
show pump status;

5.查看二进制日志状态（登录tidb数据库）

show variables like 'log_bin'  # 为1则为正常开启

如果这时候我们的log_bin=0，可以查看是否之前安装的pump,drainer没有安装成功，但也没有清理安静，可以通过--force参数强制缩容后，重新启动下集群就可以了

这时候我们就可以开始正常同步了，因为之前的数据没有记录binlog，所以需要将之前的数据全量导入进来，之后新建的库表都会增量同步

6.查看当前tidb的binlog更新到了哪个位置（登录到下游的数据库）

use tidb_binlog;
select * from checkpoint;

7.binlogctl的使用

 tiup ctl binlog --help

以下为具体内容

Usage of binlogctl:
-V
输出 binlogctl 的版本信息
-cmd string
    命令模式，包括 "generate_meta"（已废弃）, "pumps", "drainers", "update-pump" ,"update-drainer", "pause-pump", "pause-drainer", "offline-pump", "offline-drainer"
-data-dir string
    保存 Drainer 的 checkpoint 的文件的路径 (默认 "binlog_position")（已废弃）
-node-id string
    Pump/Drainer 的 ID
-pd-urls string
    PD 的地址，如果有多个，则用"," 连接 (默认 "http://127.0.0.1:2379")
-ssl-ca string
    SSL CAs 文件的路径
-ssl-cert string
        PEM 格式的 X509 认证文件的路径
-ssl-key string
        PEM 格式的 X509 key 文件的路径
-time-zone string
    如果设置时区，在 "generate_meta" 模式下会打印出获取到的 tso 对应的时间。例如 "Asia/Shanghai" 为 CST 时区，"Local" 为本地时区
-show-offline-nodes
    在用 `-cmd pumps` 或 `-cmd drainers` 命令时使用，这两个命令默认不显示 offline 的节点，仅当明确指定 `-show-offline-nodes` 时会显示
-state string
      set node's state, can set to online, pausing, paused, closing or offline.
-text string
      text to be encrypt when using encrypt command

7.1 查看状态

tiup ctl binlog -cmd drainers
tiup ctl binlog -cmd pumps

binlogctl 提供以下命令暂停/下线服务：

cmd	说明	示例
pause-pump	暂停 Pump	`bin/binlogctl -pd-urls=http://127.0.0.1:2379 -cmd pause-pump -node-id ip-127-0-0-1:8250`
pause-drainer	暂停 Drainer	`bin/binlogctl -pd-urls=http://127.0.0.1:2379 -cmd pause-drainer -node-id ip-127-0-0-1:8249`
offline-pump	下线 Pump	`bin/binlogctl -pd-urls=http://127.0.0.1:2379 -cmd offline-pump -node-id ip-127-0-0-1:8250`
offline-drainer	下线 Drainer	`bin/binlogctl -pd-urls=http://127.0.0.1:2379 -cmd offline-drainer -node-id ip-127-0-0-1:8249`

注：已经为pause状态的想要变成offline状态最好的方式是重新启动，然后用offline-pump/offline-drainer来操作

在服务正常运行以及符合流程的暂停、下线过程中，Pump/Drainer 的状态都是可以正确的。但是在一些异常情况下 Pump/Drainer 无法正确维护自己的状态，可能会影响数据同步任务，在这种情况下需要使用 binlogctl 修复状态信息。

设置 cmd 为 update-pump 或者 update-drainer 来更新 Pump 或者 Drainer 的状态。Pump 和 Drainer 的状态可以为 paused 或者 offline。例如：

bin/binlogctl -pd-urls=http://127.0.0.1:2379 -cmd update-pump -node-id ip-127-0-0-1:8250 -state paused

四、番外

1.pump配置文件信息

# Pump Configuration

# Pump 绑定的地址
addr = "192.168.0.11:8250"

# Pump 对外提供服务的地址
advertise-addr = "192.168.0.11:8250"

# Pump 只保留多少天以内的数据 (默认 7)
gc = 7

# Pump 数据存储位置路径
data-dir = "data.pump"

# Pump 向 PD 发送心跳的间隔 (单位 秒)
heartbeat-interval = 2

# PD 集群节点的地址 (英文逗号分割，中间不加空格)
pd-urls = "http://192.168.0.16:2379,http://192.168.0.15:2379,http://192.168.0.14:2379"

# [security]
# 如无特殊安全设置需要，该部分一般都注解掉
# 包含与集群连接的受信任 SSL CA 列表的文件路径
# ssl-ca = "/path/to/ca.pem"
# 包含与集群连接的 PEM 形式的 X509 certificate 的路径
# ssl-cert = "/path/to/drainer.pem"
# 包含与集群链接的 PEM 形式的 X509 key 的路径
# ssl-key = "/path/to/drainer-key.pem"

# [storage]
# 设置为 true（默认值）来保证可靠性，确保 binlog 数据刷新到磁盘
# sync-log = true

# 当可用磁盘容量小于该设置值时，Pump 将停止写入数据
# 42 MB -> 42000000, 42 mib -> 44040192
# default: 10 gib
# stop-write-at-available-space = "10 gib"

# Pump 内嵌的 LSM DB 设置，除非对该部分很了解，否则一般注解掉
# [storage.kv]
# block-cache-capacity = 8388608
# block-restart-interval = 16
# block-size = 4096
# compaction-L0-trigger = 8
# compaction-table-size = 67108864
# compaction-total-size = 536870912
# compaction-total-size-multiplier = 8.0
# write-buffer = 67108864
# write-L0-pause-trigger = 24
# write-L0-slowdown-trigger = 17

2.drainer配置文件信息

# Drainer Configuration.

# Drainer 提供服务的地址("192.168.0.13:8249")
addr = "192.168.0.13:8249"

# Drainer 对外提供服务的地址
advertise-addr = "192.168.0.13:8249"

# 向 PD 查询在线 Pump 的时间间隔 (默认 10，单位 秒)
detect-interval = 10

# Drainer 数据存储位置路径 (默认 "data.drainer")
data-dir = "data.drainer"

# PD 集群节点的地址 (英文逗号分割，中间不加空格)
pd-urls = "http://192.168.0.16:2379,http://192.168.0.15:2379,http://192.168.0.14:2379"

# log 文件路径
log-file = "drainer.log"

# Drainer 从 Pump 获取 binlog 时对数据进行压缩，值可以为 "gzip"，如果不配置则不进行压缩
# compressor = "gzip"

# [security]
# 如无特殊安全设置需要，该部分一般都注解掉
# 包含与集群连接的受信任 SSL CA 列表的文件路径
# ssl-ca = "/path/to/ca.pem"
# 包含与集群连接的 PEM 形式的 X509 certificate 的路径
# ssl-cert = "/path/to/pump.pem"
# 包含与集群链接的 PEM 形式的 X509 key 的路径
# ssl-key = "/path/to/pump-key.pem"

# Syncer Configuration
[syncer]
# 如果设置了该项，会使用该 sql-mode 解析 DDL 语句，此时如果下游是 MySQL 或 TiDB 则
# 下游的 sql-mode 也会被设置为该值
# sql-mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION"

# 输出到下游数据库一个事务的 SQL 语句数量 (默认 20)
txn-batch = 20

# 同步下游的并发数，该值设置越高同步的吞吐性能越好 (默认 16)
worker-count = 16

# 是否禁用拆分单个 binlog 的 SQL 的功能，如果设置为 true，则按照每个 binlog
# 顺序依次还原成单个事务进行同步（下游服务类型为 MySQL, 该项设置为 False）
disable-dispatch = false

# safe mode 会使写下游 MySQL/TiDB 可被重复写入
# 会用 replace 替换 insert 语句，用 delete + replace 替换 update 语句
safe-mode = false

# Drainer 下游服务类型（默认为 mysql）
# 参数有效值为 "mysql"，"tidb"，"file"，"kafka"
db-type = "mysql"

# 事务的 commit ts 若在该列表中，则该事务将被过滤，不会同步至下游
ignore-txn-commit-ts = []

# db 过滤列表 (默认 "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,test")，
# 不支持对 ignore schemas 的 table 进行 rename DDL 操作
ignore-schemas = "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql"

# replicate-do-db 配置的优先级高于 replicate-do-table。如果配置了相同的库名，支持使用正则表达式进行配置。
# 以 '~' 开始声明使用正则表达式

# replicate-do-db = ["~^b.*","s1"]

# [syncer.relay]
# 保存 relay log 的目录，空值表示不开启。
# 只有下游是 TiDB 或 MySQL 时该配置才生效。
# log-dir = ""
# 每个文件的大小上限
# max-file-size = 10485760

# [[syncer.replicate-do-table]]
# db-name ="test"
# tbl-name = "log"

# [[syncer.replicate-do-table]]
# db-name ="test"
# tbl-name = "~^a.*"

# 忽略同步某些表
# [[syncer.ignore-table]]
# db-name = "test"
# tbl-name = "log"

# db-type 设置为 mysql 时，下游数据库服务器参数
[syncer.to]
host = "192.168.0.13"
user = "root"
password = ""
# 使用 `./binlogctl -cmd encrypt -text string` 加密的密码
# encrypted_password 非空时 password 会被忽略
encrypted_password = ""
port = 3306

[syncer.to.checkpoint]
# 当 checkpoint type 是 mysql 或 tidb 时可以开启该选项，以改变保存 checkpoint 的数据库
# schema = "tidb_binlog"
# 目前只支持 mysql 或者 tidb 类型。可以去掉注释来控制 checkpoint 保存的位置。
# db-type 默认的 checkpoint 保存方式是:
# mysql/tidb -> 对应的下游 mysql/tidb
# file/kafka -> file in `data-dir`
# type = "mysql"
# host = "127.0.0.1"
# user = "root"
# password = ""
# 使用 `./binlogctl -cmd encrypt -text string` 加密的密码
# encrypted_password 非空时 password 会被忽略
# encrypted_password = ""
# port = 3306

 # db-type 设置为 file 时，存放 binlog 文件的目录
    # [syncer.to]
    # dir = "data.drainer"

    # db-type 设置为 kafka 时，Kafka 相关配置
    # [syncer.to]
    # kafka-addrs 和 zookeeper-addrs 只需要一个，两者都有时程序会优先用 zookeeper 中的 kafka 地址
    # zookeeper-addrs = "127.0.0.1:2181"
    # kafka-addrs = "127.0.0.1:9092"
    # kafka-version = "0.8.2.0"
    # kafka-max-messages = 1024

    # 保存 binlog 数据的 Kafka 集群的 topic 名称，默认值为 <cluster-id>_obinlog
    # 如果运行多个 Drainer 同步数据到同一个 Kafka 集群，每个 Drainer 的 topic-name 需要设置不同的名称

kafka的完整配置实例

pump_servers:
  - host: ***
    ssh_port: 22
    port: 8250
    deploy_dir: "/data/tidb/pump-8250/deploy"
    data_dir: "/data/tidb/pump-8250/data"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      gc: 7
  - host: ***
    ssh_port: 22
    port: 8250
    deploy_dir: "/data/tidb/pump-8250/deploy"
    data_dir: "/data/tidb/pump-8250/data"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      gc: 7
  - host: ***
    ssh_port: 22
    port: 8250
    deploy_dir: "/data/tidb/pump-8250/deploy"
    data_dir: "/data/tidb/pump-8250/data"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      gc: 7
drainer_servers:
  - host: ***
    port: 8249
    data_dir: "/data/tidb/drainer-8249/data"
    # If drainer doesn't have a checkpoint, use initial commitTS as the initial checkpoint.
    # Will get a latest timestamp from pd if commit_ts is set to -1 (the default value).
    commit_ts: -1
    deploy_dir: "/data/tidb/drainer-8249/deploy"
    # The following configs are used to overwrite the `server_configs.drainer` values.
    config:
      syncer.replicate-do-db: ["abc"] # 这里要尤其注意，有很多人看了官网仍然不会配过滤库选项，是因为没加syncer的前缀
      syncer.db-type: "kafka"
      syncer.to.kafka-addrs: "***:9092,***:9092,***:9092"
      syncer.to.topic-name: "test"

官网地址：TiDB Binlog 简介 | PingCAP 文档中心

常见问题

与数据交流的路上

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
TIDB-BINLOG初步搭建

一、简介1.TiDB Binlog 是一个用于收集 TiDB 的 binlog，并提供准实时备份和同步功能的商业工具1.1 应用场景数据同步：同步 TiDB 集群数据到其他数据库实时备份和恢复：：备份 TiDB 集群数据，同时可以用于 TiDB 集群故障时恢复2. pumppump用于实时记录 TiDB 产生的 Binlog，并将 Binlog 按照事务的提交时间进行排序，再提供给 Drainer 进行消费。3.drainerdrainer从各个 Pump 中收集 Binlo
复制链接

扫一扫