NFS troubleshooting

To troubleshoot an NFS mounting problem (NOT in order!) :

  1. If you're automounting, try static mounting of the same filesystem,to a different mount point, like /mnt/nfs or /mnt. It'll probably givemore useful error messages.
  2. Try a sniffer, like ethereal, tethereal, snoop or tcpdump -v. Lookfor NFS or RPC errors in the sniffer output.
  3. Try truss/strace/par/traceagainst rpc.mountd. You probably don't want to do this with nfsd - ittends to just sit in kernel space all the time.
  4. Check your logs
  5. Please see this URL Ifyou're experiencing an NFS timeout
  6. Make sure you're exporting to and mounting from an FQDN. Sometimesweird things happen when you use short hostnames.
  7. Try exporting "insecure", in case you have a host checking for aspecific port range. Or alternatively, see if you can persuade the hostthat's not using reserved ports, touse reserved ports - EG, onAIX, this can be done with:
    • echo "/usr/sbin/nfso -o nfs_use_reserved_ports=1" && /etc/rc.net
  8. Make sure the user doing the NFS mount isn't in too many groups. Ifyou're in a large number of groups,NFS mounts can fail, seeminglyinexplicably. You can usually check this with the "id" command.If it's above some OS-specificthreshold (most likely 8, 16 or 32), thenNFS may refuse to give a mountdue to the large number of groups.
  9. Try unexporting everything, and reexporting.
  10. Try completely shutting down NFS and restarting it
  11. Make sure there isn't a firewall blocking some important traffic. Sometimes evenNFS clients will require accepting some incoming traffic,initiated by the server. This command can be very useful for this:
    • nmap -sR -I RPC dcs.nac.uci.edu
    It may or may not help to add -p1-65535 to the options.

    I suggest running this on the server against the server, on the serveragainst the client, on the client against the client, and on the clientagainst the server - then compare the results. The runs against theclient should be the same, and the runs against the server should be thesame. If something is getting blocked over the network that isn'tblocked via localhost, then you can be pretty assured that there's afirewall or something (network problem?) blocking some traffic.

    You can expect the server to have greater RPC service requirements than theclient. The client, if it is also anNFS server, may have the same RPCservices registered, but usuallyNFS will actually use a propersubset of the RPC services on an NFS server (may even be a set of size 0:).

  12. If you're automounting, and you have static mounting working, thereare two scenarios to consider:
    1. On systems that have both automount and automountd programs,automountd is the daemon, and automount is a program that is supposedto make automountd notice changes in its maps.
    2. On systems that only have an automount program, automount is thedaemon, and you need to kill and restart it (without using the-9 signal!) to make it see changes.
  13. Are all of the relevant daemons running? You probably want somethinglike the following in rpcinfo -p:
    program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100021    1   udp  32775  nlockmgr
    100021    3   udp  32775  nlockmgr
    100021    4   udp  32775  nlockmgr
    100021    1   tcp  32768  nlockmgr
    100021    3   tcp  32768  nlockmgr
    100021    4   tcp  32768  nlockmgr
    100024    1   udp  32776  status
    100024    1   tcp  32769  status
    100011    1   udp    671  rquotad
    100011    2   udp    671  rquotad
    100011    1   tcp    690  rquotad
    100011    2   tcp    690  rquotad
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100005    1   udp    693  mountd
    100005    1   tcp    708  mountd
    100005    2   udp    693  mountd
    100005    2   tcp    708  mountd
    100005    3   udp    693  mountd
    100005    3   tcp    708  mountd
    (the numbers in the left column are more significant than the names inthe right column)

    From there, you can get probably to the daemon names using netstat -apand/or lsof.

    Make sure that the actual daemon names sound NFS-related; sometimes anon-RPC program will steal a port that rpcbind/portmap thought it couldallocate - but couldn't.

    Alternatively, you can just run my rpc-healthscript - but note that it won't detect missing services, only servicesthat are registered but not responding to a minimal test.

  14. Try the mount with TCP or UDP, whichever you haven't tried already.TCP should be better on long hauls or flakey networks, and UDP should bebetter on close, reliable networks. But if one isn't working, go aheadand try the other anyway.
  15. Are you using a flakey version of NFS? EG, are both of the systems thatcannot communicate via NFS using the still-rough NFSv4 (Wed Feb 2314:16:34 PST 2005)? IIRC, idmap is indicative of NFSv4 on a Fedora Core3 system. NFSv4 reportedly worked better in FC2 than it does in FC3,though yum -y update may have changed that by now. It's probably worthit to try at least NFS v2 and v3, and maybe v4 as well.
  16. Try a different blocksize for read and/or write. 8192 is a goodnumber to try, if you haven't yet (most systems default to this). 8192is -not- always optimal though. Some sun systems used to crash if youused a blocksize of 32768. Also, some linux systems default to 1024,which is a good choice on particularly flakey networks, or when you'restuck with a poor network card.
  17. Can you mount a different filesystem from the NFS server, but not theone you want?
  18. Are there permissions on the -mount-point-, underneath a mountedfilesystem, that are confusing matters? I once saw anNFS problem thatturned out to be due to this on a SunOS 4.1.x system.
  19. Do you have a firewall that is blocking ICMP packets inappropriately?Some ICMP's are hazardous, but others can be essential to non-flakeynetwork communication.
  20. Check showmount -e nfs.server.com
  21. Check your netgroups, and NIS in general, if you're exporting tonetgroups. Also try removing the netgroup export temporarily, and justexporting to the host you need to have access from but isn't working.
  22. If you have a large number of mounts, and suddenly subsequent mountsstart failing, and the same thing happens after a reboot, you may berunning out of privileged ports.
  23. If you run man for each NFS daemon in turn, do they have an option forcranking up verbosity? If so, and you've gotten this far, you may aswell try it. :)
  24. Linux: Try enabling debugging facilities and checking for errors:
    • RPC debugging:
      • echo 2048 > /proc/sys/sunrpc/rpc_debug
      • grep . /proc/net/rpc/*/content
      • ls -l /proc/fs/nfsd
      • cat /proc/fs/nfs/exports
    • NFS debugging:
      • # turn on linux nfs debug
      • echo 1 > /proc/sys/sunrpc/nfs_debug
      • # turn off linux nfs debug
      • echo 0 > /proc/sys/sunrpc/nfs_debug
    • Facilities in perspective:
      • Actually, there is a whole bitmask of values you can use here in orderto selectively turn on or off parts of the debugging code.See the NFSDBG_* defines in include/linux/nfs_fs.h.

        There are similar bitmasks for the RPC, NLM (i.e. lockd) and nfsdsubsystems in include/linux/sunrpc/debug.h, include/linux/lockd/debug.hand include/linux/nfsd/debug.h. These bitmasks acton /proc/sys/sunrpc/rpc_debug, /proc/sys/sunrpc/nlm_debugand /proc/sys/sunrpc/nfsd_debug respectively.

        Note as I said earlier, though, this is really designed for debuggingpurposes. There are no plans to convert it into an administrative tool.

  25. Linux: Run this once while the NFS server is working, and then again whenthe NFS server is having problems:
    • cat /etc/exports
    • cat /proc/fs/nfsd/exports
    • grep . /proc/net/rpc/*/content
  26. Post to any and all relevant mailing lists and newsgroups :) Dothis sequentially, not in parallel - to keep the people you want helpfrom, from getting annoyed by reading and rereading the same messageover and over again unnecessarily. Do not cross post.
  27. Call the relevant vendors :)

From

http://stromberg.dnsalias.org/~strombrg/NFS-troubleshooting-2.html


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值