mysql数据启动大量报错且无法启动(The InnoDB memory heap is disabled,Status: NOT_KILLED)故障排查

周一上班巡检发现公司的有一组mysql主从同步进程报警,登录报警服务器,进行查看排查步骤如下:

1,首先登录保障服务器,执行mysql命令登录mysql,发现报错,提示SOCK无法连接详如下:

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

2.执行 lsof -n -i:3306 查看发现mysql监听端口不在!

3. 启动数据库;

/etc/init.d/mysqld start

提示启动成功!

4.再次查看mysql监听端口,执行 lsof -n -i:3306 

查看发现mysql监听端口还是不在!呀郁闷了,这种现象还是第一次见,同时感觉到问题有些严重。

5.查看mysql进程,发现此时 mysql 进程竟然还在!!!!!!!!! 真实坑。。。

ps -ef |grep mysql     

mysql    19143     1  0 12:26 pts/6    00:00:01 /mysql/bin/mysqld --basedir=/mysql --datadir=/data/3306/data 

6.此时查看一下mysql系统日志 

tail -f /var/log/mysqlerror.log  

发现mysql在不停打印一下错误日志,篇幅还挺大,详细报错内容如下: 

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains




information that should help you find out what is causing the crash.




121210 12:25:40 mysqld_safe Number of processes running now: 0




121210 12:25:40 mysqld_safe mysqld restarted




121210 12:25:41 InnoDB: The InnoDB memory heap is disabled




121210 12:25:41 InnoDB: Mutexes and rw_locks use GCC atomic builtins




121210 12:25:41 InnoDB: Compressed tables use zlib 1.2.3




121210 12:25:41 InnoDB: Initializing buffer pool, size = 2.0G




121210 12:25:41 InnoDB: Completed initialization of buffer pool




121210 12:25:41 InnoDB: highest supported file format is Barracuda.




121210 12:25:41 InnoDB: Waiting for the background threads to start




121210 12:25:42 InnoDB: 1.1.8 started; log sequence number 7102612557




121210 12:25:42 [Note] Semi-sync replication initialized for transactions.




121210 12:25:42 [Note] Semi-sync replication enabled on the master.




121210 12:25:42 [Note] Recovering after a crash using mysql-bin




121210 12:25:42 [Note] Starting crash recovery...




121210 12:25:42 [Note] Crash recovery finished.




121210 12:25:42 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306




121210 12:25:42 [Note] - '0.0.0.0' resolves to '0.0.0.0';




121210 12:25:42 [Note] Server socket created on IP: '0.0.0.0'.




121210 12:25:42 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql04-relay-bin' to avoid this problem.




121210 12:25:42 [Note] Slave I/O thread: Start semi-sync replication to master 'rsync@172.17.2.13:3306' in log 'mysql-bin.000605' at position 607146584




121210 12:25:42 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000605' at position 607113383, relay log './mysql04-relay-bin.000285' position: 607113529




121210 12:25:42 [Note] Event Scheduler: Loaded 0 events




121210 12:25:42 [Note] /mysql/bin/mysqld: ready for connections.




Version: '5.5.25-log' socket: '/tmp/mysqld.sock' port: 3306 Source distribution




03:25:42 UTC - mysqld got signal 11 ;




This could be because you hit a bug. It is also possible that this binary




or one of the libraries it was linked against is corrupt, improperly built,




or misconfigured. This error can also be caused by malfunctioning hardware.




We will try our best to scrape up some info that will hopefully help




diagnose the problem, but since we have already crashed,




something is definitely wrong and this may fail.









key_buffer_size=8589934592




read_buffer_size=2097152




max_used_connections=0




max_threads=5000




thread_count=0




connection_count=0




It is possible that mysqld could use up to




key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 59645600 K bytes of memory




Hope that's ok; if not, decrease some variables in the equation.









Thread pointer: 0x7efd78000990




Attempting backtrace. You can use the following information to find out




where mysqld died. If you see no messages after this, something went




terribly wrong...




stack_bottom = 7efd821b8e58 thread_stack 0x30000




/mysql/bin/mysqld(my_print_stacktrace+0x2e)[0x758f8e]




/mysql/bin/mysqld(handle_fatal_signal+0x41a)[0x6500ea]




/lib64/libpthread.so.0(+0xf500)[0x7f0022539500]




/lib64/libc.so.6(memcpy+0x15b)[0x7f00212271ab]




/mysql/bin/mysqld(_ZN15Field_varstring6unpackEPhPKhjb+0x54)[0x63c154]




/mysql/bin/mysqld(_Z10unpack_rowPK14Relay_log_infoP5TABLEjPKhPK9st_bitmapPS5_Pm+0x3bf)[0x6e8a8f]




121210 12:25:42 [Note] Slave I/O thread: connected to master 'rsync@172.17.2.13:3306',replication started in log 'mysql-bin.000605' at position 607146584




/mysql/bin/mysqld(_ZN14Rows_log_event9write_rowEPK14Relay_log_infob+0xa9)[0x6e2a39]




/mysql/bin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEPK14Relay_log_info+0x20)[0x6e2d60]




/mysql/bin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x1da)[0x6e38ba]




/mysql/bin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x11d)[0x515bcd]




/mysql/bin/mysqld[0x519c98]




/mysql/bin/mysqld(handle_slave_sql+0x8f5)[0x51afc5]




/lib64/libpthread.so.0(+0x7851)[0x7f0022531851]




/lib64/libc.so.6(clone+0x6d)[0x7f002128611d]









Trying to get some variables.




Some pointers may be invalid and cause the dump to abort.




Query (0): is an invalid pointer




Connection ID (thread ID): 2




Status: NOT_KILLED

7.杀掉mysql进程后,mysql系统日志停止打印,

kill -9  19143 

ps -ef |grep mysql

8.看了一下mysql日志,大致报3一下几块内容, InnoDB 引擎缓存参数设置,mysql同步日志还有mysql关于缓存参数设置等;

我这边先核实了一下自己的mysql 的my.cnf的配置参数,进行了相关的调整后启动数据库和之前的现象一样,提示mysql启动成功,但是监听端口不在,后台残留有一个mysql进程,不停打印错误日志。

根据在日志的报错内容查了一下,有些文档都说在 mysql建临时表的时候指定了使用InnoDB引擎,mysql数据库建立临时表不允许使用InnoDB引擎等内容,问了一下在使用数据库的同事没有进行这方面的操作,应该不是此问题导致。

继续看日志内容发现下面的内容:

 Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql04-relay-bin' to avoid this problem.

#提示说我的一个同步服务器改了名称导致,用 --relay-log=mysql04-relay-bin' 方式来解决问题;

这台mysql数据库配置了主从,试着从这个角度来解决问题,用下面的方法启动数据库;

9.使用下面的方法启动数据库;数据库能够正常启动;

/mysql/bin/mysqld_safe --relay-log nor --relay-log-index

数据库正常启动,查看mysql监听3306端口,端口能够正常监听端口;

lsof -n -i:3306

COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME

mysqld  2303 mysql   11u  IPv4  40725      0t0  TCP *:mysql (LISTEN)

10.进入数据库,将mysql同步slave进程关闭;然后充值mysql ,master 和 slave 进程,然后重启一下mysql数据库,数据库能够正常启动,且监听端正常了。

mysql> stop slave;

Query OK, 0 rows affected (0.00 sec)

mysql> reset master;

Query OK, 0 rows affected (0.01 sec)

mysql> reset slave;

Query OK, 0 rows affected (0.01 sec)

11.我这台mysql 和另外一台配置的是互为主从,我在两端配置都执行了上面步骤 10 的操作,然后按照主从同步的方法重新配置mysql主从同步。两台mysql都能够正常工作了。

问题排查总结:

此次mysql数据库无法正常启动原因,为mysql同步日志文件导致,在mysql启动时无法正常读取,用mysql_safe --relay-log nor --relay-log-index 方式启动数据库,将mysql同步建成重置即可解决问题。