1. Silly Name
What is a "silly rename"? Why do these .nfsXXXXX files keep showing up?
A. Unix applications often open a scratch file and then unlink it. They do this so that the file is not visible in the file system name space to any other applications, and so that the system will automatically clean up (delete) the file when the application exits. This is known as "delete on last close", and is a tradition among Unix applications.
Because of the design of the NFS protocol, there is no way for a file to be deleted from the name space but still remain in use by an application. Thus NFS clients have to emulate this using what already exists in the protocol. If an open file is unlinked, an NFS client renames it to a special name that looks like ".nfsXXXXX". This "hides" the file while it remains in use. This is known as a "silly rename." Note that NFS servers have nothing to do with this behavior.
After all applications on a client have closed the silly-renamed file, the client automatically finishes the unlink by deleting the file on the server. Generally this is effective, but if the client crashes before the file is removed, it will leave the .nfsXXXXX file. If you are sure that the applications using these files are no longer running, it is safe to delete these files manually.
The NFS version 4 protocol is stateful, and could actually support delete-on-last-close. Unfortunately there isn't an easy way to do this and remain backwards-compatible with version 2 and 3 accessors.
2. nfs v4.0协议下的服务器load升高问题
在使用nfs v4.0协议挂载nfs文件系统的时候,存在一个seq id序列号瓶颈问题,具体内容请看https://access.redhat.com/solutions/2142081
- There is a limitation to the Linux NFS4.0 client implementation that an "open_owner" is mapped to a userid. This results in a bottleneck if one user opens and closes a lot of files in a short period of time. Each OPEN / CLOSE operation has to wait for a sequence id, which essentially serializes each OPEN / CLOSE request. If an NFS server's response time for OPEN / CLOSE requests increases due to some secondary load or complication, this NFS4 client limitation can become pronounced, and in some cases, cause an unresponsive machine.
- The NFS4.1 protocol addresses the limitation of serialization of OPENs per open_owner. For more information, see RFC 5661 Section 9.10
故障时的现象:服务器load无限升高,但是应用的cpu使用率并不高
故障触发条件:短时间内read close大量的文件
解决的方案:1. 使用nfs v3协议挂载规避; 2. 使用nfs v4.1协议挂载