systemctl 如果用于守护mount 进程时,建议在systemctl代码段ExecStart指向的mount脚本中增加umount命令再去执行mount命令,因为一旦一个mount的目录的进程被OOM后,这个mount目录其实还是被占用的,需要umount后才能再次mount上去
mount脚本如下
root@DAILAPGDBUP001:~# cat /root/mountdatadomaindir.sh
/opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true
systemctl代码段ExecStart指向了该mount脚本,systemctl信息如下
root@DAILAPGDBUP001:~# vim /usr/lib/systemd/system/mountdatadomaindir.service
[Unit]
Description=mountdatadomaindir
After=network.target
[Service]
User=root
Group=root
Type=forking
ExecStart=/bin/bash /root/mountdatadomaindir.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target
root@DAILAPGDBUP001:~# systemctl enable mountdatadomaindir
有一次发生了OOM,咱们systemctl已经是加了Restart=on-failure的,但是没看到/mnt/datadomaindir被挂载了,/var/log/syslogs有如下记录,
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: A process of this unit has been killed by the OOM killer.
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Main process exited, code=killed, status=9/KILL
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Failed with result 'oom-kill'.
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Consumed 36min 37.125s CPU time.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Scheduled restart job, restart counter is at 1.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: Stopped mountdatadomaindir.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Consumed 36min 37.125s CPU time.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: Starting mountdatadomaindir...
Oct 15 01:01:04 DAILAPGDBUP001 bash[1896219]: Not able to access the mount point /mnt/datadomaindir
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Control process exited, code=exited, status=1/FAILURE
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Failed with result 'exit-code'.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Failed to start mountdatadomaindir.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Scheduled restart job, restart counter is at 2.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Stopped mountdatadomaindir.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Starting mountdatadomaindir...
Oct 15 01:01:06 DAILAPGDBUP001 bash[1896286]: Not able to access the mount point /mnt/datadomaindir
Oct 15 01:01:06 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Control process exited, code=exited, status=1/FAILURE
并且ll /mnt/显示ls: cannot access ‘datadomaindir’: Transport endpoint is not connected,并且挂载的目录信息都是显示?问号
root@DAILAPGDBUP001:~# ll /mnt/
ls: cannot access 'datadomaindir': Transport endpoint is not connected
total 8
drwxr-xr-x 3 root root 4096 Sep 16 04:16 ./
drwxr-xr-x 20 root root 4096 Aug 31 06:36 ../
d????????? ? ? ? ? ? datadomaindir/
解决方法:在/root/mountdatadomaindir.sh中增加一段umount /mnt/datadomaindir,原因就是一旦一个mount的目录的进程被OOM后,这个mount目录其实还是被占用的,需要umount后才能再次mount上去
root@DAILAPGDBUP001:~# vim /root/mountdatadomaindir.sh
umount /mnt/datadomaindir
/opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true