在一次对MySQL的tpch测试时,将数据存储在另外一台服务器上时出现宕机的问题,做了以下尝试以及解决方案
刚开始时,由于MySQL实例宕机,以为是MySQL本身的问题,当时使用的版本是8.0.25
服务器1:IP1
服务器2:IP2
在IP1上进行MySQL的tpch测试,但是,数据写到挂载盘上,挂载的盘在另外一台服务器IP2:/benchmark/test上。
出现问题时,排查的思路是:MySQL参数调整;服务器磁盘是否有坏块;挂载是否正确
开始出现的问题是MySQL实例宕机,就从MySQL参数调整开始,具体思路如下:
(1)、调整MySQL的一些参数
当时出现的问题有以下:
a.实例启动时,initdb时MySQL实例宕机
b.将redo刷盘时宕机,当时有宕机时的stack信息
c.写入ibd文件时宕机,错误信息有tablespace
做的一些尝试:
<1>、O_DIRECT
innodb_flush_method=O_DIRECT
直接使用direct I/O, 当时MySQL实例宕机时,提示os error。
<2>、将挂载的目录调整为:/home/uos/nas,这样做是为了“骗过”操作系统,让他以为是本机的磁盘目录,实际上这个目录是另外一个台挂载服务的路径,测试依然失败
<3>、innodb_flush_log_at_trx_commit=0
每次事务提交时,将事务日志写入到os的缓存中,什么时候落盘由os决定。
缺点是 : 服务器突然宕机可能导致数据丢失。
<4>、innodb_flush_log_at_trx_commit=2
每次事务提交时,将事务日志写出到os缓存中,后台线程每隔1s将os缓存中的
事务日志flush到磁盘文件中。这种情况下最多丢失1s的数据。
(2)、挂载的磁盘坏块检测
根据报错信息"Error number 5 means 'Input/output error'",以为是挂载的磁盘有坏块,因此对挂载的磁盘进行了坏块检测。
使用的命令如下:
badblock -v /dev/sdd
结果是磁盘没有坏块,结果图就不贴了自己去验证就可以了。
(3)、排查mount时参数设置是否有问题
最后,能想到的问题都排查了一遍之后,怀疑mount时挂载方式有问题导致写入数据时出现问题。
经过在网上搜集mount的问题之后,使用了新的参数,mount命令如下:
mount -t nfs -o rw,intr,soft,timeo=30,retry=3 192.168.xx.xx:/test /home/zgaoq/mysql_tpch_test
具体的mount参数网上都有,不再啰嗦了,不明白的自己可以mount --help查看。
总之,导致此问题的原因是在使用 nfs 挂载的时候使用的是默认的 hard-mount 挂载功能。
当服务端停止服务时,客户端加载 nfs 不成功,就会不断的重试,直到服务端恢复之前,挂载目录都会出现卡死的情况。
因此需要在挂载时更换为 soft-mount,使用此功能挂载后,当服务端出现停止服务的情况时,会重试 retry 设定的固定次数。如果尝试所设定的次数后都不成功,则放弃此操作,返回错误信息 "Connect time out"。
附图是宕机信息,是其中的2次宕机:
2024-03-28T01:06:52.588029Z 0 [System] [MY-013169] [Server] /home/zgaoq/zz/mysql/bin/mysqld (mysqld 8.0.25) initializing of server in progress as process 107141
2024-03-28T01:06:52.588053Z 0 [ERROR] [MY-010338] [Server] Can't find error-message file '/home/zgaoq/zz/mysql/share/errmsg.sys'. Check error-message file location and 'lc-messages-dir' configuration directive.
2024-03-28T01:06:52.604388Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-03-28T01:07:08.300880Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-03-28T01:07:11.831364Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
2024-03-28T01:07:17.524960Z 0 [System] [MY-010116] [Server] /home/zgaoq/zz/mysql/bin/mysqld (mysqld 8.0.25) starting as process 107294
2024-03-28T01:07:17.524998Z 0 [ERROR] [MY-010338] [Server] Can't find error-message file '/home/zgaoq/zz/mysql/share/errmsg.sys'. Check error-message file location and 'lc-messages-dir' configuration directive.
2024-03-28T01:07:17.551098Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-03-28T01:07:20.007887Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-03-28T01:07:20.504703Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /tmp/mysqlx.sock
2024-03-28T01:07:21.584623Z 2 [Warning] [MY-012638] [InnoDB] Retry attempts for reading partial data failed.
2024-03-28T01:07:21.584744Z 2 [ERROR] [MY-012642] [InnoDB] Tried to read 16384 bytes at offset 15859712, but was only able to read 0
2024-03-28T01:07:21.584824Z 2 [ERROR] [MY-012592] [InnoDB] Operating system error number 5 in a file operation.
2024-03-28T01:07:21.584903Z 2 [ERROR] [MY-012596] [InnoDB] Error number 5 means 'Input/output error'
2024-03-28T01:07:21.585058Z 2 [ERROR] [MY-012646] [InnoDB] File mysql.ibd: 'read' returned OS error 105. Cannot continue operation
2024-03-28T01:07:21.585138Z 2 [ERROR] [MY-012981] [InnoDB] Cannot continue operation.
2024-03-28T01:26:29.544884Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2024-03-28T01:26:29.545007Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2024-03-28T01:26:29.632701Z 0 [System] [MY-010931] [Server] /home/zgaoq/zz/mysql/bin/mysqld: ready for connections. Version: '8.0.25' socket: '/tmp/mysql.sock' port: 3306 Source distribution.
2024-03-28T01:26:47.972192Z 14 [ERROR] [MY-012144] [InnoDB] posix_fallocate(): Failed to preallocate data for file ./tpch/customer.ibd, desired size 1048576 bytes. Operating system error number 5. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Refer to your operating system documentation for operating system error code information.
2024-03-28T01:26:47.973382Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL] fsync() returned EIO, aborting.
2024-03-28T01:26:47.973424Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: ut0ut.cc:555 thread 140135354058496
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-03-28T01:26:47.973452Z 14 [ERROR] [MY-013622] [InnoDB] [FATAL] fsync() returned EIO, aborting.
01:26:47 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2024-03-28T01:26:47.973468Z 14 [ERROR] [MY-013183] [InnoDB] Assertion failure: ut0ut.cc:555 thread 140144477812480
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.