1 MAXWELL简述
Maxwell是一个能实时读取MySQL二进制日志binlog,并生成 JSON 格式的消息,作为生产者发送给 Kafka,Kinesis、RabbitMQ、Redis、Google Cloud Pub/Sub、文件或其它平台的应用程序。它的常见应用场景有ETL、维护缓存、收集表级别的dml指标、增量到搜索引擎、数据分区迁移、切库binlog回滚方案等。
官方网站:http://maxwells-daemon.io/quickstart/
1.1 MAXWELL工作原理
Maxwell的工作原理也是伪装成mysql的slave,然后利用mysql的主从复制机制。这点与Canal类似,这里不做赘述,详见《Canal技术实践V1.0.docx》1.1章节
1.2 MAXWELL的特点
Maxwell的主要特点:
1.能够实时的监控 Mysql 数据的变化
2.直接保存为json格式
3.可以配置过滤database、table、column和黑名单
1.3 MAXWELL应用场景
- 实时监控并同步Mysql中数据的变化。
2 MAXWELL部署
2.1 中间件版本选取
中间件名称 | 版本号 |
---|---|
CentOS | CentOS 6.8 |
mysql | 5.6.24 |
kafka | 0.11.0.2 |
Maxwell | 1.27.0 |
2.2 环境准备
本次技术实践安装Maxwell集群,单独安装在hadoop102主机上,后续HA另做规划
2.2.1 CentOS6.8
CentOS6.8安过程省略。预先创建用户/用户组zhouchen
预先安装jdk1.8.0_92 +
预先安装mysql
2.2.2 关闭防火墙-root
针对CentOS7以下
1.查看防火墙状态
service iptables status
2.停止防火墙
service iptables stop
3.启动防火墙
service iptables start
2.3 集群安装
2.3.1 Mysql配置
1.赋权限
mysql> GRANT ALL on maxwell.* to 'maxwell'@'%' identified by 'maxwell';
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
2.开启 binlog
打开文件my.cnf, 如果没有就创建一个
[zhouchen@hadoop102 /]$ sudo find ./ -iname my.cnf
./usr/my.cnf
添加如下配置:
[zhouchen@hadoop102 /]$ sudo vim /usr/my.cnf
[mysqld]
server-id= 1
log-bin= mysql-bin #开启bin-log
binlog_format= row #选择row模式
3.重启 mysql 使 binlog 生效
[zhouchen@hadoop102 /]$ sudo service mysql restart
4.检查 binlog 是否生效
1)进入 mysql
mysql> show variables like 'log_%';
+----------------------------------------+--------------------------------+
| Variable_name | Value |
+----------------------------------------+--------------------------------+
| log_bin | ON |
| log_bin_basename | /var/lib/mysql/mysql-bin |
| log_bin_index | /var/lib/mysql/mysql-bin.index |
| log_bin_trust_function_creators | OFF |
| log_bin_use_v1_row_events | OFF |
| log_error | /var/lib/mysql/hadoop201.err |
| log_output | FILE |
| log_queries_not_using_indexes | OFF |
| log_slave_updates | OFF |
| log_slow_admin_statements | OFF |
| log_slow_slave_statements | OFF |
| log_throttle_queries_not_using_indexes | 0 |
| log_warnings | 1 |
+----------------------------------------+--------------------------------+
*log_bin:on 表示开启成功
2)/var/lib/mysql下的mysql-bin文件
2.3.2 Maxwell安装
1.下载Maxwell
[zhouchen@hadoop102 software]$ wget https://github.com/zendesk/maxwell/releases/download/v1.27.0/maxwell-1.27.0.tar.gz
2.解压
[zhouchen@hadoop102 software]$ tar -zxvf maxwell-1.27.0.tar.gz -C /opt/module/
3.修改Maxwell通用配置config.properties
[zhouchen@hadoop102 maxwell-1.27.0]$ vim config.properties
# tl;dr config
log_level=info
# mysql login info
host=hadoop102
user=maxwell
password=maxwell
2.3.3 集群启动
启动:
[zhouchen@hadoop102 maxwell-1.27.0]$ bin/maxwell --user='maxwell' --password='maxwell' --host='hadoop102' --producer=stdout
Using kafka version: 1.0.0
15:20:33,541 WARN MaxwellMetrics - Metrics will not be exposed: metricsReportingType not configured.
15:20:33,831 INFO SchemaStoreSchema - Creating maxwell database
15:20:33,912 INFO Maxwell - Maxwell v1.27.0 is booting (StdoutProducer), starting at Position[BinlogPosition[mysql-bin.000070:4928], lastHeartbeat=0]
15:20:33,998 INFO AbstractSchemaStore - Maxwell is capturing initial schema
15:20:34,209 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000070:4928
15:20:34,220 INFO BinaryLogClient - Connected to hadoop102:3306 at mysql-bin.000070/4928 (sid:6379, cid:25)
15:20:34,220 INFO BinlogConnectorReplicator - Binlog connected.
2.3.4 安装检查
1.查看进程
[zhouchen@hadoop102 ~]$ jps
6466 Maxwell
6573 Jps
3 MAXWELL基本操作
3.1 测试MAXWELL正常接收BINLOG
1.启动Maxwell
详见2.2.3
2.向mysql中插入数据
mysql> insert into test_maxwell values(4,6,'jack',4);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test_maxwell values(3,5,'jinb',5);
Query OK, 1 row affected (0.01 sec)
3.检查Maxwell前台进程实时打印日志(可见打印的binlog是json格式)
[zhouchen@hadoop102 maxwell-1.27.0]$ bin/maxwell --user='maxwell' --password='maxwell' --host='hadoop102' --producer=stdout
Using kafka version: 1.0.0
15:20:33,541 WARN MaxwellMetrics - Metrics will not be exposed: metricsReportingType not configured.
15:20:33,831 INFO SchemaStoreSchema - Creating maxwell database
15:20:33,912 INFO Maxwell - Maxwell v1.27.0 is booting (StdoutProducer), starting at Position[BinlogPosition[mysql-bin.000070:4928], lastHeartbeat=0]
15:20:33,998 INFO AbstractSchemaStore - Maxwell is capturing initial schema
15:20:34,209 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000070:4928
15:20:34,220 INFO BinaryLogClient - Connected to hadoop102:3306 at mysql-bin.000070/4928 (sid:6379, cid:25)
15:20:34,220 INFO BinlogConnectorReplicator - Binlog connected.
{"database":"test","table":"test_maxwell","type":"insert","ts":1595402774,"xid":1207,"commit":true,"data":{"age":4,"tall":6,"name":"jack","record":4}}
{"database":"test","table":"test_maxwell","type":"insert","ts":1595402775,"xid":1210,"commit":true,"data":{"age":3,"tall":5,"name":"jinb","record":5}}
3.2 MAXWELL实时向KAFKA同步数据
3.2.1 配置kafka信息
[zhouchen@hadoop102 maxwell-1.27.0]$ vim config.properties
producer=kafka
kafka.bootstrap.servers=hadoop102:9092,hadoop103:9092,hadoop104:9092 #kafka集群
kafka_topic=maxwell #写入Kafka的topic,需要提前创建
kafka.retries=1
kafka.acks=all
kinesis_stream=maxwell
3.2.2 将kafka-client拷贝到maxwell/lib目录下
[zhouchen@hadoop102 kafka-clients]$ cp /opt/module/kafka/libs/kafka-clients-0.11.0.2.jar /opt/module/maxwell-1.27.0/lib/kafka-clients/
3.2.3 启动kafka并创建topic maxwell
[zhouchen@hadoop102 kafka]$ bin/kafka-topics.sh --create --zookeeper hadoop202:2181 --replication-factor 1 --partitions 3 –topic maxwell
3.2.4 启动Kafka消费maxwell topic
[zhouchen@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic maxwell --from-beginning
3.2.5 启动maxwell
[zhouchen@hadoop102 maxwell-1.27.0]$ bin/maxwell --user='maxwell' --password='maxwell' --host='hadoop102' --producer=kafka --kafka_version=0.11.0.2 --kafka.bootstrap.servers=hadoop102:9092,hadoop103:9092,hadoop104:9092 --kafka_topic=maxwell
Using kafka version: 0.11.0.2
16:01:39,871 WARN MaxwellMetrics - Metrics will not be exposed: metricsReportingType not configured.
16:01:40,194 INFO ProducerConfig - ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [hadoop102:9092, hadoop103:9092, hadoop104:9092]
buffer.memory = 33554432
client.id =
compression.type = snappy
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = null
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
16:01:40,224 INFO AppInfoParser - Kafka version : 0.11.0.2
16:01:40,224 INFO AppInfoParser - Kafka commitId : 73be1e1168f91ee2
16:01:40,239 INFO Maxwell - Maxwell v1.27.0 is booting (MaxwellKafkaProducer), starting at Position[BinlogPosition[mysql-bin.000070:166226], lastHeartbeat=1595404689401]
16:01:40,370 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at Position[BinlogPosition[mysql-bin.000070:4928], lastHeartbeat=0])
16:01:40,427 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000070:166226
16:01:40,439 INFO BinaryLogClient - Connected to hadoop102:3306 at mysql-bin.000070/166226 (sid:6379, cid:41)
16:01:40,439 INFO BinlogConnectorReplicator - Binlog connected
3.2.6 向Mysql插入数据
mysql> insert into test_maxwell values(1,1,'jim',2);
mysql> insert into test_maxwell values(2,1,'grace',2);
mysql> insert into test_maxwell values(3,5,'jinb',5);
mysql> insert into test_maxwell values(4,6,'jack',4);
mysql> insert into test_maxwell values(18,198,'lily',18);
mysql> insert into test_maxwell values(19,187,'green',36);
3.2.7 kafka消费到插入的数据
注意:
1.此时Maxwell前台进程并不会打印实时同步的binlog
2.消费的maxwell topic中的消息直接就是json格式的
[zhouchen@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 -
{"database":"test","table":"test_maxwell","type":"insert","ts":1595404919,"xid":6248,"commit":true,"data":{"age":1,"tall":1,"name":"jim","record":2}}
{"database":"test","table":"test_maxwell","type":"insert","ts":1595404919,"xid":6249,"commit":true,"data":{"age":2,"tall":1,"name":"grace","record":2}}
{"database":"test","table":"test_maxwell","type":"insert","ts":1595404919,"xid":6250,"commit":true,"data":{"age":3,"tall":5,"name":"jinb","record":5}}
{"database":"test","table":"test_maxwell","type":"insert","ts":1595404919,"xid":6251,"commit":true,"data":{"age":4,"tall":6,"name":"jack","record":4}}
{"database":"test","table":"test_maxwell","type":"insert","ts":1595404919,"xid":6252,"commit":true,"data":{"age":18,"tall":198,"name":"lily","record":18}}
{"database":"test","table":"test_maxwell","type":"insert","ts":1595404920,"xid":6256,"commit":true,"data":{"age":19,"tall":187,"name":"green","record":36}}
3.3 MAXWELL的断点续传
maxwell默认提供断点续传功能
1.先kill maxwell进程
09:47:14,409 INFO TaskManager - Stopping 3 tasks
09:47:14,409 INFO TaskManager - Stopping: com.zendesk.maxwell.schema.PositionStoreThread@7c690249
09:47:14,409 INFO TaskManager - Stopping: com.zendesk.maxwell.bootstrap.BootstrapController@72aee3b4
09:47:14,409 INFO PositionStoreThread - Storing final position: Position[BinlogPosition[mysql-bin.000071:1407], lastHeartbeat=1595468834305]
09:47:14,409 INFO TaskManager - Stopping: com.zendesk.maxwell.replication.BinlogConnectorReplicator@76916939
09:47:14,722 INFO TaskManager - Stopped all tasks
2.再到数据库中修改几条数据:
mysql> update test.test_maxwell set age = 20 where name = 'grace';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
3.再次启动maxwell进程,修改的数据仍可以接收到
[zhouchen@hadoop102 maxwell-1.27.0]$ bin/maxwell --user='maxwell' --password='maxwell' --host='hadoop102' --producer=stdout
Using kafka version: 1.0.0
09:50:37,462 WARN MaxwellMetrics - Metrics will not be exposed: metricsReportingType not configured.
09:50:37,940 INFO Maxwell - Maxwell v1.27.0 is booting (StdoutProducer), starting at Position[BinlogPosition[mysql-bin.000071:1407], lastHeartbeat=1595468834305]
09:50:38,064 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at Position[BinlogPosition[mysql-bin.000070:4928], lastHeartbeat=0])
09:50:38,403 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000071:1407
09:50:38,414 INFO BinaryLogClient - Connected to hadoop102:3306 at mysql-bin.000071/1407 (sid:6379, cid:33)
09:50:38,414 INFO BinlogConnectorReplicator - Binlog connected.
{"database":"test","table":"test_maxwell","type":"update","ts":1595468985,"xid":180,"commit":true,"data":{"age":20,"tall":1,"name":"grace","record":2},"old":{"age":2}}
3.4 MAXWELL的过滤配置
Maxwell 可以通过 --filter 配置项来指定过滤规则,通过 exclude 排除,通过 include 包含,值可以为具体的数据库、数据表、数据列,甚至用 Javascript 来定义复杂的过滤规则;可以用正则表达式描述,有几个来自官网的例子
#仅匹配foodb数据库的tbl表和所有table_数字的表
--filter='exclude: foodb.*, include: foodb.tbl, include: foodb./table_\d+/'
#排除所有库所有表,仅匹配db1数据库
--filter = 'exclude: *.*, include: db1.*'
#排除含db.tbl.col列值为reject的所有更新
--filter = 'exclude: db.tbl.col = reject'
#排除任何包含col_a列的更新
--filter = 'exclude: *.*.col_a = *'
#blacklist 黑名单,完全排除bad_db数据库,若要恢复,必须删除maxwell库
--filter = 'blacklist: bad_db.*'