Unfortunately, the GPFS service running at compute nodes “mmfsd” gets killed sometimes by the out of memory killer. This usually unmounts GPFS from the compute node and makes it unavailable in LSF. The mount point /gpfs3 shows “stale file handle”. This can also happen even after the node was rebooted.
The good news is that there is a quite easy fix.
Login to compute node with SSH
Check /gpfs3 mount point
[root@b35n20 ~]# ls -l /gpfs3/
ls: /gpfs3/: Stale file handle
ls: cannot open directory /gpfs3/: Stale file handle
shut down GPFS locally, start it again and wait a few seconds
[root@b35n20 ~]# mmshutdown
Wed Oct 2 09:32:10 CEST 2019: mmshutdown: Starting force unmount of GPFS file systems
Wed Oct 2 09:32:15 CEST 2019: mmshutdown: Shutting down GPFS daemons
Shutting down!
Unloading modules from /lib/modules/3.10.0-957.21.3.el7.x86_64/extra
Unloading module mmfs26
Unloading module mmfslinux
Wed Oct 2 09:32:19 CEST 2019: mmshutdown: Finished
[root@b35n20 ~]# mmstartup
Wed Oct 2 09:32:31 CEST 2019: mmstartup: Starting GPFS …
Check /gpfs3 mount point again
[root@b35n20 xcatpost]# ls -l /gpfs3
total 123534208
drwxrwxr-x 171 root root 32768 Sep 30 10:04 applications
…