之前mysql慢日志,错误日志都是由操作系统logroate进行按天切割,压缩。切割方式配置的是copytruncate,也即copy一份后压缩后,清空当前日志。这种方式有几个问题:
1、日志会存在丢失
2、如果日志比较大,进行清空时影响性能
3、日志切割了,但文件并没有更换,还是原来的文件(即文件inode未变),对收集日志的程序不友好(切割后很难判断何时需要重新打开日志文件)
当然可以不使用copytruncate方式,可以使用postrotate执行flush命令切割,但由于存在多实例部署,很难确定当前切的日志是哪个实例。所以如果继续使用logroate切换还是有问题。
所以本次对日志切割进行了优化,通过创建硬链接的方式切割日志来解决上述问题,具体过程:
1、找到机器部署的mysql实例
2、获取实例日志文件
3、日志创硬链接
4、删除当前日志
5、flush 日志,自动产生新的日志文件
6、压缩历史日志
具体脚本如下:
#!/bin/bash
#
# Created by beebel
# logroate mysql error/slow/general logs
#
DT=`date +%Y%m%d`
PORTLIST=$(netstat -nltp|grep -w mysqld|awk '{print $4}'|awk -F ":" '{print $NF}')
LOCAL_IP=$(ip addr |sed -r 's/(.*inet )(1.*)(\/.*)/\2/' |egrep "^10\.|^192\.|^172.1[6-9] |^172.2[0-9]|^172.3[0-1]"|head -n 1)
CMD="/usr/local/mysql/bin/mysql -uroot -p`cat /home/mysql/.root_localhost` -h ${LOCAL_IP}"
for port in ${PORTLIST}
do
# slow_log
slow_log_file=$($CMD -P${port} -BNe "show variables like 'slow_query_log_file'"|awk '{print $NF}')
if [ -f ${slow_log_file}-${DT}.gz ];then
mv ${slow_log_file}-${DT}.gz ${slow_log_file}-${DT}`date +%H%M -d "1 minutes ago"`.gz
fi
if [ -s ${slow_log_file} ];then
ln ${slow_log_file} ${slow_log_file}-${DT} && rm -f ${slow_log_file} && gzip ${slow_log_file}-${DT}
$CMD -P${port} -e "flush slow logs;"
fi
# error_log
error_log_file=$($CMD -P${port} -BNe "show variables like 'log_error'"|awk '{print $NF}')
if [ -f ${error_log_file}-${DT}.gz ];then
mv ${error_log_file}-${DT}.gz ${error_log_file}-${DT}`date +%H%M -d "1 minutes ago"`.gz
fi
if [ -s ${error_log_file} ];then
ln ${error_log_file} ${error_log_file}-${DT} && rm -f ${error_log_file} && gzip ${error_log_file}-${DT}
$CMD -P${port} -e "flush error logs;"
fi
# general_log
general_log=$($CMD -P${port} -BNe "show variables like 'general_log'"|awk '{print $NF}')
if [ "${general_log}" == "ON" ];then
general_log_file=$($CMD -P${port} -BNe "show variables like 'general_log_file'"|awk '{print $NF}')
if [ -f ${general_log_file}-${DT}.gz ];then
mv ${general_log_file}-${DT}.gz ${general_log_file}-${DT}`date +%H%M -d "1 minutes ago"`.gz
fi
if [ -s ${general_log_file} ];then
ln ${general_log_file} ${general_log_file}-${DT} && rm -f ${general_log_file} && gzip ${general_log_file}-${DT}
$CMD -P${port} -e "flush general logs;"
fi
fi
done
find /var/log/mysql/ -mtime +7 -type f -name "*.gz"|xargs -i rm -f {}
看似脚本没有什么问题,在线上一跑,发现有些实例同步断了,同步中断原因为执行flush error logs失败,导致同步断了。此时有两个问题:
1、为什么从库无法执行flush error logs
2、为什么flush error logs会通过主从同步下来
第一个问题复现下就知道了,手动执行也是提示unkown error, 检查error日志发现日志文件不存在,且目录没有mysql是不可写的,创建一个空文件,并授权给mysql用户后,flush error logs;就能正常执行了。这个主要是部署时环境问题。
部署时/etc/my.cnf配置文件没有清,导致安装后error_log的文件指定了默认的/var/log/mysqld.log。
第二个问题就去检查主库binlog,确实binlog记录了flush error logs,flush general logs,flush slow logs。这点是很奇怪的,之前手动切换日志直接执行的是flush logs;为避免影响binlog 就执行指定的flush,但并没有记录binlog。所以理解上就认为 flush error/general/slow logs也不会记录binlog。
于是找官方文档确认,确实是之前的自我认为是错误的。
By default, the server writes FLUSH statements to the binary log so that they replicate to replicas. To suppress logging, specify the optional NO_WRITE_TO_BINLOG keyword or its alias LOCAL.
默认flush 是会记录到binlog,复制到从库的,如果不想记录到binlog可以用`NO_WRITE_TO_BINLOG` 或者local选项。
后面还有断提示:
Note
FLUSH LOGS, FLUSH BINARY LOGS, FLUSH TABLES WITH READ LOCK (with or without a table list), and FLUSH TABLES tbl_name ... FOR EXPORT are not written to the binary log in any case because they would cause problems if replicated to a replica.
之前执行的flush logs,FLUSH BINARY LOGS, FLUSH TABLES WITH READ LOCK and FLUSH TABLES tbl_name ... FOR EXPORT是不记录binlog的。所以第二个问题也就了解清楚原因了。
所以最后调整切割脚本如下,将flush加上NO_WRITE_TO_BINLOG:
#!/bin/bash
#
# Created by beebel
# logroate mysql error/slow/general logs
#
DT=`date +%Y%m%d`
PORTLIST=$(netstat -nltp|grep -w mysqld|awk '{print $4}'|awk -F ":" '{print $NF}')
LOCAL_IP=$(ip addr |sed -r 's/(.*inet )(1.*)(\/.*)/\2/' |egrep "^10\.|^192\.|^172.1[6-9] |^172.2[0-9]|^172.3[0-1]"|head -n 1)
CMD="/usr/local/mysql/bin/mysql -uroot -p`cat /root/.dba/db.pwd/root_localhost` -h ${LOCAL_IP}"
for port in ${PORTLIST}
do
# slow_log
slow_log_file=$($CMD -P${port} -BNe "show variables like 'slow_query_log_file'"|awk '{print $NF}')
if [ -f ${slow_log_file}-${DT}.gz ];then
mv ${slow_log_file}-${DT}.gz ${slow_log_file}-${DT}`date +%H%M -d "1 minutes ago"`.gz
fi
if [ -s ${slow_log_file} ];then
ln ${slow_log_file} ${slow_log_file}-${DT} && rm -f ${slow_log_file} && gzip ${slow_log_file}-${DT}
$CMD -P${port} -e "flush NO_WRITE_TO_BINLOG slow logs;"
fi
# error_log
error_log_file=$($CMD -P${port} -BNe "show variables like 'log_error'"|awk '{print $NF}')
if [ -f ${error_log_file}-${DT}.gz ];then
mv ${error_log_file}-${DT}.gz ${error_log_file}-${DT}`date +%H%M -d "1 minutes ago"`.gz
fi
if [ -s ${error_log_file} ];then
ln ${error_log_file} ${error_log_file}-${DT} && rm -f ${error_log_file} && gzip ${error_log_file}-${DT}
$CMD -P${port} -e "flush NO_WRITE_TO_BINLOG error logs;"
fi
# general_log
general_log=$($CMD -P${port} -BNe "show variables like 'general_log'"|awk '{print $NF}')
if [ "${general_log}" == "ON" ];then
general_log_file=$($CMD -P${port} -BNe "show variables like 'general_log_file'"|awk '{print $NF}')
if [ -f ${general_log_file}-${DT}.gz ];then
mv ${general_log_file}-${DT}.gz ${general_log_file}-${DT}`date +%H%M -d "1 minutes ago"`.gz
fi
if [ -s ${general_log_file} ];then
ln ${general_log_file} ${general_log_file}-${DT} && rm -f ${general_log_file} && gzip ${general_log_file}-${DT}
$CMD -P${port} -e "flush NO_WRITE_TO_BINLOG general logs;"
fi
fi
done
find /var/log/mysql/ -mtime +7 -type f -name "*.gz"|xargs -i rm -f {}
flush logs 等价执行如下:
FLUSH BINARY LOGS
FLUSH ENGINE LOGS
FLUSH ERROR LOGS
FLUSH GENERAL LOGS
FLUSH RELAY LOGS
FLUSH SLOW LOGS