目录
1.异常日志
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
* Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
* Starting automatic rewriting of AOF on 107914% growth
* Background append only file rewriting started by pid 4143
* AOF rewrite child asks to stop sending diffs.
* Parent agreed to stop sending diffs. Finalizing AOF...
* Concatenating 0.00 MB of AOF diff received from parent.
* SYNC append only file rewrite performed
* AOF rewrite: 2 MB of memory used by copy-on-write
* Background AOF rewrite terminated with success
* Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
* Background AOF rewrite finished successfully
* Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
* Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
* Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
2.问题分析
'配置文件配置'
appendonly yes # 开启aof
appendfsync everysec # 设置aof策略,每秒写入一次
aof-use-rdb-preamble yes #开启aof rdb混合使用
aof-load-truncated yes # redis启动加载aof文件时,忽略掉错误的命令,尽可能多的加载可用命令
aof-rewrite-incremental-fsync yes # 分批刷入aof文件,可以有效利用顺序IO
no-appendfsync-on-rewrite no # 保证数据尽可能少的丢失,设置为no,最多丢失2s数据,设置为yes,最多会丢失30s数据
auto-aof-rewrite-min-size 67108864 # aof文件大小 64M
auto-aof-rewrite-percentage 100 #(aof_current_size-aof_base_size)/aof_base_size与100%比较
'触发rewrite机制下边两条同时满足'
1.当前aof文件(aof_current_size)> 64MB
2.(aof_current_size-aof_base_size)/aof_base_size > 100%
结合监控分析
右边aof_delayed_fsync参数一致在持续增加,代表着aof在持续发生阻塞的情况
左边可以看到已经满足上述的aof进行rewrite的条件,aof在频繁的进行rewrite操作
3.引起原因
查看了监控的命令,以及aof文件的命令总结以下原因
1.客户端是用redis来做队列,又怕数据丢失,选择了aof做持久化,队列中的key还都很大,基本上都是30k左右的值,虽然监控上看内存的值是没有很大
2.大量的大命令都堆积到了aof文件中,aof文件很快就达到了rewrite的触发条件,导致redis在不断的进行rewrite
3.又因设置了no-appendfsync-on-rewrite no,所以在rewrite期间,是不允许追加fsync的,再加上频繁的rewrite操作,就导致了aof的阻塞发生
4.解决方案
对于redis来说,最好还是用来做缓存,用来做队列,还要使用aof来持久化是不建议的,上边就是很好的例子,建议将redis做队列的功能,更改为用kafka/rabbitmq/rocketmq等专业的队列中间件来实现,若想继续使用redis做的话,请关闭aof持久化,并减小参数值,避免redis的阻塞,至于数据丢失问题,可以外加数据补偿机制,如果redis宕机等以外情况发生可以自行重推数据
5.appendfsync everysec不是1s
no-appendfsync-on-rewrite no / appendfsync everysec
每秒落盘一次,实际上不是1s,看下边的逻辑图,主线程在对比时间判断的是2s,此时最多丢失2s数据no-appendfsync-on-rewrite yes / appendfsync everysec 等价于 appendfsync no
那么buff中的数据只能等到linux的sync执行的时候才会落盘,默认间隔30s,此时最多丢失30s数据