故障现象:富士通M5000主机重启后无法开启nfs服务,该主机无法作为nfs server提供服务,检查SMF服务后,发现svc:/network/nfs/status:default服务absent(服务丢失,在系统中无法发现):
bash-3.00# svcs -xv <---------------查看系统未能正常启动的服务
svc:/network/nfs/nlockmgr:default (NFS lock manager)
State: offline since Fri Jul 22 12:58:00 2011
Reason: Dependency svc:/network/nfs/status is absent.
See: http://sun.com/msg/SMF-8000-E2
See: man -M /usr/share/man -s 1M lockd
Impact: 2 dependent services are not running:
svc:/network/nfs/client:default
svc:/network/nfs/server:default
bash-3.00# svcs -l svc:/network/nfs/nlockmgr:default <-----------------列出该服务的详细信息
fmri svc:/network/nfs/nlockmgr:default
name NFS lock manager
enabled true
state offline
next_state none
state_time Fri Jul 22 12:58:00 2011
restarter svc:/system/svc/restarter:default
dependency require_any/none svc:/milestone/network (online)
dependency require_all/none svc:/network/rpc/bind (online)
dependency require_all/none svc:/network/nfs/status (absent)
dependency require_all/none svc:/system/filesystem/minimal (online)
bash-3.00# svcs -D svc:/network/nfs/nlockmgr:default <----------------------列出该服务的被依赖关系
STATE STIME FMRI
offline 12:58:00 svc:/network/nfs/client:default
offline 12:58:01 svc:/network/nfs/server:default
bash-3.00# svcs -a|grep nfs <-------------------显示所有跟nfs相关的服务,status服务并没有启动,已经丢失(absent)
online 12:58:08 svc:/network/nfs/mapid:default
online 12:58:08 svc:/network/nfs/cbd:default
online 12:58:11 svc:/network/nfs/rquota:default
offline 12:58:00 svc:/network/nfs/nlockmgr:default
offline 12:58:00 svc:/network/nfs/client:default
offline 12:58:01 svc:/network/nfs/server:default
由上面的关系可以看出,由于status服务缺失(absent),导致了nlockmgr服务无法正常启动,从而引起了被nlockmgr依赖的两个服务server和client也未能启动
故障恢复:由于服务缺失了,因此考虑用svccfg import service_manifest.xml方法来尝试status的配置文件手工导入,配置文件存放地点为/var/svc/manifest/network/nfs
bash-3.00# pwd
/var/svc/manifest/network/nfs
bash-3.00# svccfg import status.xml <------------------手工导入配置文件到repository库中
bash-3.00# svcs -a|grep nfs
online 12:58:08 svc:/network/nfs/cbd:default
online 12:58:11 svc:/network/nfs/rquota:default
online 14:05:45 svc:/network/nfs/mapid:default
online 21:39:32 svc:/network/nfs/nlockmgr:default
online 21:39:33 svc:/network/nfs/client:default
online 21:39:33 svc:/network/nfs/server:default
maintenance 21:40:55 svc:/network/nfs/status:default
bash-3.00# svcs -xv svc:/network/nfs/status:default <---------------------status状态详细列表
svc:/network/nfs/status:default (NFS status monitor)
State: maintenance since Fri Jul 22 21:39:33 2011
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: man -M /usr/share/man -s 1M statd
See: /var/svc/log/network-nfs-status:default.log
Impact: This service is not running.
bash-3.00# cat /var/svc/log/network-nfs-status:default.log <-----------------------查看日志信息
[ Jul 22 21:39:08 Disabled. ]
[ Jul 22 21:39:08 Rereading configuration. ]
[ Jul 22 21:39:32 Enabled. ]
[ Jul 22 21:39:32 Executing start method ("/usr/lib/nfs/statd" ]
[ Jul 22 21:39:32 Method "start" exited with status 0 ]
[ Jul 22 21:39:32 Stopping because all processes in service exited. ]
[ Jul 22 21:39:32 Executing stop method (:kill) ]
[ Jul 22 21:39:32 Executing start method ("/usr/lib/nfs/statd" ]
[ Jul 22 21:39:32 Method "start" exited with status 0 ]
[ Jul 22 21:39:33 Stopping because all processes in service exited. ]
[ Jul 22 21:39:33 Executing stop method (:kill) ]
[ Jul 22 21:39:33 Executing start method ("/usr/lib/nfs/statd" ]
[ Jul 22 21:39:33 Method "start" exited with status 0 ]
[ Jul 22 21:39:33 Stopping because all processes in service exited. ]
[ Jul 22 21:39:33 Executing stop method (:kill) ]
[ Jul 22 21:39:33 Restarting too quickly, changing state to maintenance
突然想起之前我手工执行了启动 /nfs/status服务的method"/usr/lib/nfs/statd",可能是这个原因导致这里一直报错,无法online
bash-3.00# ps -ef |grep nfs
daemon 815 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfs4cbd
daemon 802 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfsmapid
root 2435 1729 0 21:46:49 pts/2 0:00 grep nfs
root 1997 1 0 21:39:34 ? 0:00 /usr/lib/nfs/mountd
daemon 8022 1 0 14:00:25 ? 0:00 /usr/lib/nfs/lockd
daemon 1999 1 0 21:39:34 ? 0:00 /usr/lib/nfs/nfsd
daemon 7748 1 0 13:57:43 ? 0:00 /usr/lib/nfs/statd
bash-3.00# kill -9 7748 <----------------------将之前手工执行方法后产生的后台进程杀掉
bash-3.00# ps -ef |grep nfs
daemon 815 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfs4cbd
daemon 802 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfsmapid
root 1997 1 0 21:39:34 ? 0:00 /usr/lib/nfs/mountd
daemon 8022 1 0 14:00:25 ? 0:00 /usr/lib/nfs/lockd
root 2469 1729 0 21:47:33 pts/2 0:00 grep nfs
daemon 1999 1 0 21:39:34 ? 0:00 /usr/lib/nfs/nfsd
bash-3.00# svcadm refresh svc:/network/nfs/status:default <-------------------刷新
bash-3.00# svcadm clear svc:/network/nfs/status:default <------------------修复后,重启
bash-3.00# svcs -a|grep nfs
online 12:58:08 svc:/network/nfs/cbd:default
online 12:58:11 svc:/network/nfs/rquota:default
online 14:05:45 svc:/network/nfs/mapid:default
online 21:39:32 svc:/network/nfs/nlockmgr:default
online 21:39:33 svc:/network/nfs/client:default
online 21:39:33 svc:/network/nfs/server:default
online 21:47:59 svc:/network/nfs/status:default
看着服务online了,眼泪花儿都包起了
故障解除。。。。。。
-------------------------------------------------------------------------------------------------------------------------------------------------------------
详细命令和smf相关信息参考solarisSMF.pdf smf-workshop-ganesh.pdf
bash-3.00# svcs -xv <---------------查看系统未能正常启动的服务
svc:/network/nfs/nlockmgr:default (NFS lock manager)
State: offline since Fri Jul 22 12:58:00 2011
Reason: Dependency svc:/network/nfs/status is absent.
See: http://sun.com/msg/SMF-8000-E2
See: man -M /usr/share/man -s 1M lockd
Impact: 2 dependent services are not running:
svc:/network/nfs/client:default
svc:/network/nfs/server:default
bash-3.00# svcs -l svc:/network/nfs/nlockmgr:default <-----------------列出该服务的详细信息
fmri svc:/network/nfs/nlockmgr:default
name NFS lock manager
enabled true
state offline
next_state none
state_time Fri Jul 22 12:58:00 2011
restarter svc:/system/svc/restarter:default
dependency require_any/none svc:/milestone/network (online)
dependency require_all/none svc:/network/rpc/bind (online)
dependency require_all/none svc:/network/nfs/status (absent)
dependency require_all/none svc:/system/filesystem/minimal (online)
bash-3.00# svcs -D svc:/network/nfs/nlockmgr:default <----------------------列出该服务的被依赖关系
STATE STIME FMRI
offline 12:58:00 svc:/network/nfs/client:default
offline 12:58:01 svc:/network/nfs/server:default
bash-3.00# svcs -a|grep nfs <-------------------显示所有跟nfs相关的服务,status服务并没有启动,已经丢失(absent)
online 12:58:08 svc:/network/nfs/mapid:default
online 12:58:08 svc:/network/nfs/cbd:default
online 12:58:11 svc:/network/nfs/rquota:default
offline 12:58:00 svc:/network/nfs/nlockmgr:default
offline 12:58:00 svc:/network/nfs/client:default
offline 12:58:01 svc:/network/nfs/server:default
由上面的关系可以看出,由于status服务缺失(absent),导致了nlockmgr服务无法正常启动,从而引起了被nlockmgr依赖的两个服务server和client也未能启动
故障恢复:由于服务缺失了,因此考虑用svccfg import service_manifest.xml方法来尝试status的配置文件手工导入,配置文件存放地点为/var/svc/manifest/network/nfs
bash-3.00# pwd
/var/svc/manifest/network/nfs
bash-3.00# svccfg import status.xml <------------------手工导入配置文件到repository库中
bash-3.00# svcs -a|grep nfs
online 12:58:08 svc:/network/nfs/cbd:default
online 12:58:11 svc:/network/nfs/rquota:default
online 14:05:45 svc:/network/nfs/mapid:default
online 21:39:32 svc:/network/nfs/nlockmgr:default
online 21:39:33 svc:/network/nfs/client:default
online 21:39:33 svc:/network/nfs/server:default
maintenance 21:40:55 svc:/network/nfs/status:default
bash-3.00# svcs -xv svc:/network/nfs/status:default <---------------------status状态详细列表
svc:/network/nfs/status:default (NFS status monitor)
State: maintenance since Fri Jul 22 21:39:33 2011
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: man -M /usr/share/man -s 1M statd
See: /var/svc/log/network-nfs-status:default.log
Impact: This service is not running.
bash-3.00# cat /var/svc/log/network-nfs-status:default.log <-----------------------查看日志信息
[ Jul 22 21:39:08 Disabled. ]
[ Jul 22 21:39:08 Rereading configuration. ]
[ Jul 22 21:39:32 Enabled. ]
[ Jul 22 21:39:32 Executing start method ("/usr/lib/nfs/statd" ]
[ Jul 22 21:39:32 Method "start" exited with status 0 ]
[ Jul 22 21:39:32 Stopping because all processes in service exited. ]
[ Jul 22 21:39:32 Executing stop method (:kill) ]
[ Jul 22 21:39:32 Executing start method ("/usr/lib/nfs/statd" ]
[ Jul 22 21:39:32 Method "start" exited with status 0 ]
[ Jul 22 21:39:33 Stopping because all processes in service exited. ]
[ Jul 22 21:39:33 Executing stop method (:kill) ]
[ Jul 22 21:39:33 Executing start method ("/usr/lib/nfs/statd" ]
[ Jul 22 21:39:33 Method "start" exited with status 0 ]
[ Jul 22 21:39:33 Stopping because all processes in service exited. ]
[ Jul 22 21:39:33 Executing stop method (:kill) ]
[ Jul 22 21:39:33 Restarting too quickly, changing state to maintenance
突然想起之前我手工执行了启动 /nfs/status服务的method"/usr/lib/nfs/statd",可能是这个原因导致这里一直报错,无法online
bash-3.00# ps -ef |grep nfs
daemon 815 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfs4cbd
daemon 802 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfsmapid
root 2435 1729 0 21:46:49 pts/2 0:00 grep nfs
root 1997 1 0 21:39:34 ? 0:00 /usr/lib/nfs/mountd
daemon 8022 1 0 14:00:25 ? 0:00 /usr/lib/nfs/lockd
daemon 1999 1 0 21:39:34 ? 0:00 /usr/lib/nfs/nfsd
daemon 7748 1 0 13:57:43 ? 0:00 /usr/lib/nfs/statd
bash-3.00# kill -9 7748 <----------------------将之前手工执行方法后产生的后台进程杀掉
bash-3.00# ps -ef |grep nfs
daemon 815 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfs4cbd
daemon 802 1 0 12:58:09 ? 0:00 /usr/lib/nfs/nfsmapid
root 1997 1 0 21:39:34 ? 0:00 /usr/lib/nfs/mountd
daemon 8022 1 0 14:00:25 ? 0:00 /usr/lib/nfs/lockd
root 2469 1729 0 21:47:33 pts/2 0:00 grep nfs
daemon 1999 1 0 21:39:34 ? 0:00 /usr/lib/nfs/nfsd
bash-3.00# svcadm refresh svc:/network/nfs/status:default <-------------------刷新
bash-3.00# svcadm clear svc:/network/nfs/status:default <------------------修复后,重启
bash-3.00# svcs -a|grep nfs
online 12:58:08 svc:/network/nfs/cbd:default
online 12:58:11 svc:/network/nfs/rquota:default
online 14:05:45 svc:/network/nfs/mapid:default
online 21:39:32 svc:/network/nfs/nlockmgr:default
online 21:39:33 svc:/network/nfs/client:default
online 21:39:33 svc:/network/nfs/server:default
online 21:47:59 svc:/network/nfs/status:default
看着服务online了,眼泪花儿都包起了
故障解除。。。。。。
-------------------------------------------------------------------------------------------------------------------------------------------------------------
详细命令和smf相关信息参考solarisSMF.pdf smf-workshop-ganesh.pdf
在此特别感谢asx_liu的顶力相助,
原文链接:http://bbs.chinaunix.net/forum.php?mod=viewthread&tid=3572042