mysql5.7.17 binglog_5.7.17版本mysqlbinlog实时拉取的二进制日志不完整的原因分析

问题描述:

同事使用mysqlbinlog工具的--read-from-remote-server --raw选项,从远程实例实时拉取二进制日志时,发现得到的二进制日志文件大小与远程实例上的源文件大小不相同,并且使用mysqlbinlog解析时会报错。

测试环境版本信息如下:

MySQL版本:5.7.17 log MySQL Community Server (GPL)  通用tar包安装

Mysqlbinlog版本:5.7.17 自带版本,mysqlbinlog Ver 3.4 for linux-glibc2.5 at x86_64

操作系统版本:CentOS Linux release 7.6.1810 (Core)

分析过程:

1、对比拉取到的二进制日志文件、源文件的大小,发现拉取到的二进制日志文件比源文件小,并且该文件大小:7315456/4096,刚好可以整除,源库上进行多次更新后,多次观察,发现该数值都可以被4096整除。

因为Linux块大小是4096 Bytes,所以,我先做个假设:mysqlbinlog工具以4096字节为单位进行写入

[root@192_168_129_128 tmp]# ll /usr/local/mysql-5.7.17-linux-glibc2.5-x86_64/data/mybinlog.000006 ; ll mybinlog.000006

-rw-r----- 1 mysql mysql 7318723 May 12 16:55 /usr/local/mysql-5.7.17-linux-glibc2.5-x86_64/data/mybinlog.000006

-rw-r----- 1 root root 7315456 May 12 16:55 mybinlog.000006

2、我在8.0.19版本中测试,问题却不能复现。

3、写一个简单的shell脚本,将mysqlbinlog拉取进程放到后台执行,获取到上一步的pid之后,使用pstack命令和strace命令,分别查看该进程的函数调用和系统调用

#!/bin/bash

nohup mysqlbinlog--read-from-remote-server --host=192.168.129.128 --user=dba --password=123456 --raw --to-last-log binlog.000002 --stop-never &pid=`ps -ef | grep mysqlbinlog | grep -v 'grep' | awk '{print $2}'`

pstack $pid>pstack.log

strace-f -t -p $pid -o strace.log

pstack命令的输出结果如下,虽然没有直接的线索,但是后续可以利用这些函数栈名,去源码中找线索:

#0 0x00007faf58f60a2d in recv () from /lib64/libpthread.so.0#1 0x000000000045e9f9 in inline_mysql_socket_recv (flags=0, n=5826, buf=0x1d959d1, mysql_socket=..., src_line=123, src_file=0x550918 "/export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/vio/viosocket.c") at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/include/mysql/psi/mysql_socket.h:823#2 vio_read (vio=0x1d90e60, buf=0x1d959d1 "27399199-10542799900-96491182343-85303866742-80196460272-89060617578-51177070778-10421134155;80308169113-21291571753-18876715410-91134905277-85771492482\360j\003", size=5826) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/vio/viosocket.c:123#3 0x0000000000434873 in net_read_raw_loop (count=5826, net=0x1d8e550) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/sql/net_serv.cc:672#4 net_read_packet (net=0x1d8e550, complen=0x7fff6d023f88) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/sql/net_serv.cc:859#5 0x0000000000434a7c in my_net_read (net=0x1d8e550) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/sql/net_serv.cc:899#6 0x000000000043985b in cli_safe_read_with_ok (mysql=0x1d8e550, parse_ok=0 '\000', is_data_packet=0x0) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/sql-common/client.c:1040#7 0x0000000000426b36 in dump_remote_log_entries (logname=0x7fff6d0277e9 "binlog.000002", print_event_info=0x7fff6d0250f0) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/client/mysqlbinlog.cc:2664#8 dump_single_log (logname=0x7fff6d0277e9 "binlog.000002", print_event_info=0x7fff6d0250f0) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/client/mysqlbinlog.cc:2334#9 dump_multiple_logs (argc=1, argv=) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/client/mysqlbinlog.cc:2374#10 0x0000000000427821 in main (argc=1, argv=0x1d58468) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/client/mysqlbinlog.cc:3436

strace命令的输出结果如下:

....................省略若干行....................20528 11:24:25 recvfrom(3, "6813690-49945547265-44609719076-"..., 6605, 0, NULL, NULL) = 6605

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3424256

20528 11:24:25 write(4, "553290165-79173089006-0778194955"..., 4096) = 4096

20528 11:24:25 write(4, "2\360E\t\0\0p\3\0\0w71247594556-464035229"..., 4096) = 4096

20528 11:24:25 recvfrom(3, "\0\0\211\0K\200\266^\20/\201\0\0\37\0\0\0\320d4\0\0\0\21\1\0\0\0\0\0\0"..., 16384, 0, NULL, NULL) = 1523

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 lseek(4, 0, SEEK_CUR) = 3432448

20528 11:24:25 recvfrom(3,

strace到这里就没有后续的输出了。

其中,recvfrom系统调用,是经socket接收数据;lseek是一个用于改变读写一个文件时读写指针位置的一个系统调用;write把buf中nbyte写入文件描述符handle所指的文档,成功时返回写的字节数,错误时返回-1.

分析strace日志中的多条write记录,发现每次write的写入字节数都是4096,看起来与前面的假设吻合。

但是,每达到4096字节才写入磁盘,那么拉取到的binlog几乎不可能与源库一致。而mysqlbinlog的help显示,--read-from-remote-server参数,实质上是调用MySQL Server的BINLOG-DUMP-NON-GTIDS接口,远程实例应该是实时发送binlog events才对。

-R, --read-from-remote-server

Read binary logs from a MySQL server. This is an aliasfor read-from-remote-master=BINLOG-DUMP-NON-GTIDS.

好吧,从pstack日志中得到的函数名,我们去源码中逐个查看,最终与本问题相关的是client/mysqlbinlog.cc中的dump_remote_log_entries()函数,如下:

/**

Requests binlog dump from a remote server and prints the events it

receives.

@param[in,out] print_event_info Parameters and context state

determining how to print.

@param[in] logname Name of input binlog.

@retval ERROR_STOP An error occurred - the program should terminate.

@retval OK_CONTINUE No error, the program should continue.

@retval OK_STOP No error, but the end of the specified range of

events to process has been reached and the program should terminate.*/

static Exit_status dump_remote_log_entries(PRINT_EVENT_INFO *print_event_info,const char*logname)

{

....................省略若干行....................for(;;)

{const char *error_msg=NULL;

Log_event*ev=NULL;

Log_event_type type=binary_log::UNKNOWN_EVENT;

len=cli_safe_read(mysql, NULL);if (len ==packet_error)

{

error("Got error reading packet from server: %s", mysql_error(mysql));

DBUG_RETURN(ERROR_STOP);

}if (len < 8 && net->read_pos[0] == 254)break; //end of data

DBUG_PRINT("info",( "len: %lu net->read_pos[5]: %d\n",

len, net->read_pos[5]));/*In raw mode We only need the full event details if it is a

ROTATE_EVENT or FORMAT_DESCRIPTION_EVENT*/....................省略若干行....................if (raw_mode || (type !=binary_log::LOAD_EVENT))

{....................省略若干行....................if(raw_mode)

{

DBUG_EXECUTE_IF("simulate_result_file_write_error",

DBUG_SET("+d,simulate_fwrite_error"););if (my_fwrite(result_file, net->read_pos + 1 , len - 1, MYF(MY_NABP)))/*可以看到,cli_safe_read()方法读取到的binlog event,都会调用my_fwrite函数进行写入,my_fwrite是对fwrite()函数的封装

这里并没有以4096字节为单位写入,而是读多少就写入多少

这就无法解释了,代码逻辑显示每次拿到数据之后,都会写入磁盘,为什么实际上却不是呢?*/{

error("Could not write into log file '%s'", log_file_name);

retval=ERROR_STOP;

}if(ev)

reset_temp_buf_and_delete(ev);

}

....................省略若干行....................if (retval !=OK_CONTINUE)

DBUG_RETURN(retval);

}else{

....................省略若干行....................

old_off+= len-1;

}

DBUG_RETURN(OK_CONTINUE);

}

请看我用中文注释在上述源码中的分析。

代码逻辑没有问题,那就有可能是BUG了。。。

结论:

Replication:

mysqlbinlog, if invoked with the --raw option, does not flush the output file until the process terminates. But if also invoked with the --stop-never option,

the process never terminates, thus nothing is ever written to the output file. Now the output is flushed after each event. (Bug #24609402)

但是这个BUG在BUG system中找不到,应该是需要有权限的MOS账号才能查看。

bff63e45c7c19726a1ef53ba28603dad.png

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值