mysql binlog的应用场景有很多,如主从同步、数据恢复、数据备份等等,那binlog到底是个什么玩意?参考文章:mysql主从同步机制
binlog本质是一个二进制文件,那又如何解析和使用呢?现成的工具已经有很多了,如Canal、maxwell。
Canal是阿里巴巴旗下的一款开源项目,纯Java开发。基于数据库增量日志解析,提供增量数据订阅&消费。Canal分为服务端和客户端,拥有众多的衍生应用,性能稳定,功能强大。
Maxwell由zendesk开源,也是由java开发,解析出来的结果为json,可以方便的发送到kafka、rabbitmq、redis等下游应用,进行处理。快速开始
Python-Mysql-Replication
回归正题,下面使用Python-Mysql-Replication来读取binlog日志并消费。
Mysql-Replication和Canal、maxwell工作原理类似,都是伪装成Mysql Slave,模拟Mysql Slave的交互协议,向Master发送dump协议,Master收到dump请求,就会推送binary log给Slave。但Python-Mysql-Replication更轻量,它只负责接收和解析数据,并不负责对接下游如发送到kafka等消息队列。
官网
https://github.com/julien-duponchelle/python-mysql-replication
安装
pip install mysql-replication==0.25 (较新版本对接mysql8.0时有bug)
mysql配置
#mysql master config
[mysqld]
server-id = 1 # 节点ID,确保唯一
# log config
log-bin = mysql-bin #开启mysql的binlog日志功能
binlog_format = row #binlog日志格式,mysql默认采用statement,这里必须使用row
expire_logs_days = 7 #binlog过期清理时间
max_binlog_size = 100m #binlog每个日志文件大小
slave-skip-errors = all #跳过从库错误
使用
dump解析数据
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import DeleteRowsEvent, WriteRowsEvent, UpdateRowsEvent
mysql_settings = {'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'passwd': 'pwd123456'}
stream = BinLogStreamReader(connection_settings=mysql_settings, server_id=1,
only_events=[DeleteRowsEvent, WriteRowsEvent, UpdateRowsEvent])
for binlogevent in stream:
binlogevent.dump()
stream.close()
"""
输出结果:
=== WriteRowsEvent ===
Date: 2022-04-08T15:38:13
Log position: 443
Event size: 28
Read bytes: 12
Table: cmdb-default.auth_group
Affected columns: 2
Changed rows: 1
Values:
--
* id : 1
* name : 运维组
=== WriteRowsEvent ===
Date: 2022-04-08T15:40:32
Log position: 758
Event size: 25
Read bytes: 12
Table: cmdb-default.auth_group
Affected columns: 2
Changed rows: 1
Values:
--
* id : 2
* name : 效率
=== UpdateRowsEvent ===
Date: 2022-04-08T16:18:04
Log position: 1105
Event size: 48
Read bytes: 13
Table: cmdb-default.auth_group
Affected columns: 2
Changed rows: 1
Affected columns: 2
Values:
--
*id:1=>1
*name:运维组=>运维组123
=== DeleteRowsEvent ===
Date: 2022-04-08T16:29:42
Log position: 1426
Event size: 31
Read bytes: 12
Table: cmdb-default.auth_group
Affected columns: 2
Changed rows: 1
Values:
--
* id : 1
* name : 运维组123
"""
写入/更新redis缓存
import redis
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import (
DeleteRowsEvent,
UpdateRowsEvent,
WriteRowsEvent,
)
MYSQL_SETTINGS = {
"host": "127.0.0.1",
"port": 3306,
"user": "root",
"passwd": "pwd123456"
}
def main():
r = redis.Redis()
stream = BinLogStreamReader(
connection_settings=MYSQL_SETTINGS,
only_events=[DeleteRowsEvent, WriteRowsEvent, UpdateRowsEvent])
for binlogevent in stream:
prefix = "%s:%s:" % (binlogevent.schema, binlogevent.table)
for row in binlogevent.rows:
if isinstance(binlogevent, DeleteRowsEvent):
vals = row["values"]
r.delete(prefix + str(vals["id"]))
elif isinstance(binlogevent, UpdateRowsEvent):
vals = row["after_values"]
r.hmset(prefix + str(vals["id"]), vals)
elif isinstance(binlogevent, WriteRowsEvent):
vals = row["values"]
r.hmset(prefix + str(vals["id"]), vals)
stream.close()