最近遇到一台mysql机器,空间不足,查看后发现是slow log文件占了大量空间(ext4),果断rm掉,然后再flush logs。flush logs的时候发现mysql会hang住。翻了一下flush logs的逻辑,执行flush logs的时候,mysql会执行reopen_file操作,reopen_file()先closeslow log file,再open,而这个过程是持有LOCK_log锁。目前mysql server层日志(slow/general/binary log)的写操作是都需要持有这个锁的。
这样,问题就来了。前面执行rm操作的时候,mysqld还在写对应的slow log,因此系统不会真正的删除这个文件,而执行flush logs操作的时候,mysql会去调用close()操作。close的manual是这么写的:
NAME
close - close a file descriptor
DESCRIPTION
close() closes a file descriptor, so that it no longer refers to any file and
may be reused. Any record locks (see fcntl(2)) held on the file it was associ-
ated with, and owned by the process, are removed (regardless of the file
descriptor that was used to obtain the lock).
If fd is the last copy of a particular file descriptor the resources associated
with it are freed; if the descriptor was the last reference to a file which has
been removed using unlink(2) the file is deleted.
linux删除文件的机制再加上mysql的LOCK_log这把大锁就会引起mysql hang住。类似的general log也会有这个问题。正确的做法是,将slow/general log mv成另外一个名字,再执行flush log操作,然后再去rm对应的文件。
链接:
http://www.mysqlperformanceblog.com/2007/12/09/be-careful-rotating-mysql-logs/
http://digital-forensics.sans.org/blog/2010/12/20/digital-forensics-understanding-ext4-part-1-extents
http://digital-forensics.sans.org/blog/2011/03/28/digital-forensics-understanding-ext4-part-3-extent-trees
http://digital-forensics.sans.org/blog/2011/04/08/understanding-ext4-part-4-demolition-derby