公司有一次服务器做优化,对服务器做了重启。之后再启动redis节点的时候,出现了java程序连接不上的情况,单独连接各个节点都能连接上。实在是奇怪,后来几经周折发现了问题所在:其中的一个节点在起来一会儿之后就自己挂掉了!!
查看这个节点的redis log有如下报错:
[2716] 28 Apr 10:16:51.234 # Server started, Redis version 2.8.8 [2716] 28 Apr 10:16:51.234 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. [2716] 28 Apr 10:17:27.915 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename> [2761] 28 Apr 10:19:40.866 * Increased maximum number of open files to 10032 (it was originally set to 1024).
后来百度,谷歌了一番,找到了解决方案:原因是重启服务器的时候redis里面的数据文件损坏了,就启动一会就自己挂掉了,需要进行修复。修复的步骤如下:
去redis数据目录,将aof文件备份,用redis-check-aof工具修复
[root@db redis]# cp appendonly.aof appendonly.aof.bak
[root@db redis]# redis-check-aof --fix appendonly.aof 0x c93488e5: Expected prefix ' AOF analyzed: size=3375772775, ok_up_to=3375663333, diff=109442 This will shrink the AOF from 3375772775 bytes, with 109442 bytes, to 3375663333 bytes Continue? [y/N]: y Successfully truncated AOF
其中没有安装redis-check-aof 工具的需要安装此工具:
apt-get install redis-tools
安装成功后,执行aof修复命令,完成后重启启动redis解决!!
教训:1.多个节点有问题后要确定每一个节点都能正常工作,不要盲目找原因,每个节点都正常工作多节点redis集群一定可以正常。
2.重启服务器之前先关掉服务器中的应用进程,切记切记~~(掉电除外。。)