在我们的系统中,是使用mongodb作为数据库存储的。
结果在一次的压力测试过程中,发现mongodb的cpu占用达到了400%,同时,整个人脸识别系统没有办法正常工作。分析保存下来的log显示如下:
初步猜测大概率是和mongodb是有关系的。
在没有办法确认问题之前,执行systemctl restart mongo命令,重启了mongodb。系统恢复正常工作。
后面review了系统,发现:
Jan 14 14:21:55 localhost mongod[677]: 2019-01-14T14:21:55.943+0800 E STORAGE [thread1] WiredTiger (28) [1547446915:943815][677:0x7f93ffeca0], file:WiredTiger.wt, WT_SESSION.checkpoint: /data/db/WiredTiger.turtle.set: handle-write: pwrite: failed to write 980 bytes at offset 0: No space left on device
Jan 14 14:21:55 localhost mongod[677]: 2019-01-14T14:21:55.947+0800 E STORAGE [thread1] WiredTiger (28) [1547446915:947427][677:0x7f93ffeca0], file:WiredTiger.wt, WT_SESSION.checkpoint: /data/db/WiredTiger.turtle.set: handle-write: pwrite: failed to write 980 bytes at offset 0: No space left on device
Jan 14 14:21:55 localhost mongod[677]: 2019-01-14T14:21:55.950+0800 E STORAGE [thread1] WiredTiger (28) [1547446915:950435][677:0x7f93ffeca0], checkpoint-server: checkpoint server error: No space left on device
Jan 14 14:21:55 localhost mongod[677]: 2019-01-14T14:21:55.950+0800 E STORAGE [thread1] WiredTiger (-31804) [1547446915:950606][677:0x7f93ffeca0], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
google知道,原来是mongodb已知的bug,就是journal log文件大小超标咯。
解决方案如下: