CIFS: A Common Internet File System

Search for
Advanced Search
 
 
   MSDN Home >  Microsoft Internet Developer >  November 1996
MIND

This article assumes you're familiar with: HTTP



CIFS: A Common Internet File System

Paul Leach and Dan Perry


Though based on the low-level file system implemented by Windows, CIFS is a platform-independent file sharing system.


The Internet is rapidly opening up new ways of communicating for individuals and organizations alike. Until now, most Internet usage has been limited to simple one-way file transfers or read-only browsing. However, the demand for greater interactivity on the Internet is exploding. Now, the Common Internet File System (CIFS) protocol has been introduced to support rich, collaborative applications over the Internet.
CIFS defines a standard remote file-system access protocol for use over the Internet, enabling groups of users to work together and share documents across the Internet or within corporate intranets. CIFS is an open, cross-platform technology based on the native file-sharing protocols built into Microsoft® Windows® and other popular PC operating systems, and supported on dozens of other platforms. With CIFS, millions of computer users can open and share remote files on the Internet without having to install new software or change the way they work.


CIFS in a Nutshell

CIFS enables collaboration on the Internet by defining a remote file-access protocol that is compatible with the way applications already share data on local disks and network file servers. CIFS incorporates the same high-performance, multiuser read and write operations, locking, and file-sharing semantics that are the backbone of today's sophisticated enterprise computer networks. CIFS runs over TCP/IP and utilizes the Internet's global Domain Naming Service (DNS) for scalability, and is optimized to support slower speed dial-up connections common on the Internet.
CIFS is an enhanced version of Microsoft's open, cross-platform Server Message Block (SMB) protocol, the native file-sharing protocol in the Windows 95, Windows NT®, and OS/2 operating systems and the standard way that millions of PC users share files across corporate intranets. CIFS is also widely available on Unix, VMS, and other platforms.
Microsoft is making sure that CIFS technology is open, published, and widely available for all computer users. Microsoft submitted the CIFS 1.0 protocol specification to the Internet Engineering Task Force (IETF) as an Internet-Draft document and is working with interested parties for CIFS to be published as an Informational RFC. SMB has been an Open Group (formerly X/Open) standard for PC and Unix interoperability since 1992 (X/Open CAE Specification C209).
Not intended to replace HTTP, CIFS complements HTTP while providing more sophisticated file sharing and file transfer than older protocols such as FTP. CIFS is designed to enable all applications, not just Web browsers, to open and share files securely across the Internet.

CIFS Benefits

CIFS allows multiple clients to access and update the same file, while preventing conflicts with sophisticated file-sharing and locking semantics. These mechanisms also permit aggressive caching and read-ahead/write-behind without loss of cache coherency. CIFS also supports fault tolerance in the face of network and server failures.
The CIFS protocol has been tuned to run efficiently over slow dial-up lines. The effect is improved performance for the vast number of users who today access the Internet using a modem. CIFS servers support both anonymous transfers and secure, authenticated access to named files. File and directory security policies are easy to administer. Microsoft CIFS servers are highly integrated with the operating system, tuned for maximum system performance, and easy to administer.
File names can be in any character set, not just ones designed mainly for English or Western European languages. (They can even be in Klingon if you don't have a life.) Users do not have to mount remote file systems, but can refer to them directly with globally significant names instead of ones that have only local significance.
There is also significant industry support for the CIFS protocol. Industry leaders AT&T, Data General, Digital Equipment, Intel, Intergraph, Network Appliance, and SCO are working actively with Microsoft in support of the CIFS initiative. CIFS is already widely supported in commercial software products such as AT&T Advanced Server for Unix, Digital's PATHWORKS, HP Advanced Server 9000, IBM Warp Connect, IBM LAN Server, and Novell Enterprise Toolkit, among others. In addition, CIFS is the featured file and print-sharing protocol of Samba, a popular freeware network file system available for Linux and many Unix platforms, OS/2, and VMS.


Finding a File

CIFS is based on the SMB protocol widely in use by personal computers and workstations running a variety of operating systems. The full specification (at ftp://ietf.cnri.reston.va.us/internet-drafts/
draft-heizer-cifs-v1-spec-00.txt) runs 155 pages, so we'll only look at some of the pertinent info.
For any particular file, it is assumed that the client machine will be able to determine the name of the server and the relative name within the server. In the URL "file://fs.megacorp.com/users/fred/stuff.txt," the client should know how to parse the string so it knows that this represents a file on the server fs.megacorp.com, located at the path /users/fred/stuff.txt.
Once the server name has been determined, the client needs to resolve that name to a transport address. This specification defines two ways of doing so: using the DNS or NetBIOS name resolution. The method used is configuration-dependent; the default is DNS to encourage interoperability over the Internet. The name-resolution mechanism will place constraints on the form of the server name. In the case of NetBIOS, the server name must be 15 characters or less and uppercase. The server name can also be specified as the string form of an IPv4 address in the usual dotted notation (for example, "157.33.135.101"). In this case, resolution consists of converting to the 32-bit IPv4 address.

Messages

Figure 1 illustrates a typical message-exchange sequence for a client connecting to a user-level server, opening a file, reading its data, closing the file, and disconnecting from the server. Note that, when using the SMB request-batching mechanism (called AndX), the second to sixth messages in this sequence can be combined into one; there are really only three round trips in the sequence, and the last one can be done asynchronously by the client.
Clients exchange messages with a server to access resources on that server. These messages are the previously mentioned Server Message Blocks (SMBs), and every SMB message has a common format. Multibyte values are always transmitted least-significant byte first (see Figure 2).
All SMBs have the same format up to the ParameterWords fields. Different SMBs have a different number and interpretation of ParameterWords and Buffer. All reserved fields in the SMB header must be zero. All quantities are sent in native Intel format.
Command is the operation code this SMB is requesting or responding to. Status.DosError.ErrorClass and Status.DosError.Error are set by the server and combine to give the error code of any failed server operation. If the client is capable of receiving 32-bit error returns, the status is returned in Status.Status instead. When an error is returned, the server may choose to return only the header portion of the response SMB. Flags and Flags2 contain bits that, depending on the negotiated protocol dialect, indicate various client capabilities.
Tid identifies the subdirectory, or "tree," on the server that the client is accessing. SMBs that do not reference a particular tree should set Tid to 0xFFFF. Pid and PidHigh are the caller's process ID and are generated by the client to uniquely identify a process within the client computer. Mid is reserved for multiplexing multiple messages on a single virtual circuit. A response message will always contain the same Mid value as the corresponding request message.

Opportunistic Locks

Network performance can be increased if the client can buffer file data locally. For example, the client does not have to write information into a file on the server if the client knows that no other process is accessing the data. Likewise, the client can buffer read-ahead data from the file if the client knows that no other process is writing the data. The mechanism that allows clients to dynamically alter their buffering strategy in a consistent manner is known as opportunistic locks or oplocks. Versions of the SMB file-sharing protocol including and newer than the LANMAN1.0 dialect support oplocks.
There are three different types of oplocks. An exclusive oplock allows a client to open a file for exclusive access and allows the client to perform arbitrary buffering. A batch oplock allows a client to keep a file open on the server even though the local accessor on the client machine has closed the file. A Level II oplock indicates that there are multiple readers of a file and no writers.
When a client opens a file, it requests the server to grant it a particular type of oplock on the file. The response from the server indicates the type of oplock granted to the client. The client uses the granted oplock type to adjust its buffering policy. The SMB_COM_LOCKING_ANDX SMB is used to convey oplock break and response information.




Exclusive Oplocks

If a client is granted an exclusive oplock, it may buffer byte range lock information, read-ahead data, and write data on the client because the client knows that it is the only accessor to the file. The basic protocol requires that the client open the file, requesting that an oplock be given to the client. If the file was opened by anyone else, then the client is refused the oplock and no local buffering may be performed. This also means that no read-ahead may be performed to the file unless the client knows that it has the read-ahead range locked. If the server grants the exclusive oplock, the client can perform certain optimizations for the file such as buffering lock, read, and write data.
Figure 3: Exclusive oplocks
Figure 3: Exclusive oplocks


The exclusive oplock protocol is shown in Figure 3. When client A opens the file, it can request an exclusive oplock. Provided no one else has the file open on the server, the oplock is granted to client A. If at some point in the future another client, such as client B, wants to open the same file, then the server must have client A break its oplock.
Breaking the oplock involves client A sending the server any lock or write data that it has buffered, and then letting the server know it has acknowledged that the oplock has been broken. This synchronization message informs the server that it can allow client B to complete its open. Client A must also purge any of its read-ahead buffers for the file. This is not shown in the diagram since no network traffic is needed to do this.

Batch Oplocks

Batch oplocks are used when client programs cause the amount of network traffic to go beyond an acceptable level for the functionality provided by the program. For example, the MS-DOS® command processor executes commands from within a command procedure by performing the following steps:
  • Opening the command procedure.
  • Seeking to the next line in the file.
  • Reading the line from the file.
  • Closing the file.
  • Executing the command.
This process is repeated for each command executed from the command procedure file. This type of programming model causes an inordinate amount of processing of files, thereby creating a lot of network traffic that could otherwise be curtailed if the program was to simply open the file, read a line, execute the command, and then read the next line.
Batch oplocking curtails the amount of network traffic by allowing the client to skip the extraneous open and close requests. When the MS-DOS command processor then asks for the next line in the file, the client can either ask for the next line from the server, or it may have already read the data from the file as read-ahead data. In either case, the amount of network traffic from the client is greatly reduced.
Figure 4: Batch oplocks
Figure 4: Batch oplocks


If the server receives either a rename or a delete request for the file that has a batch oplock, it must inform the client that the oplock is to be broken. The client can then switch to a mode where the file is repeatedly opened and closed (see Figure 4). When client A opens the file, it can request an oplock. Provided no one else has the file open on the server, then the oplock is granted to client A. In this case, client A keeps the file open for its caller across multiple open/close operations. Data may be read ahead for the caller, and other optimizations, such as buffering locks, can also be performed.
When another client requests an open, rename, or delete operation from the server for the file, client A must clean up its buffered data and synchronize with the server. Most of the time this involves actually closing the file, provided that client A's caller actually believes that it has closed the file. Once the file is actually closed, client B's open request can be completed.

Level II Oplocks

Level II oplocks allow multiple clients to have the same file open as long as no client is performing write operations to the file. This is important for many environments because many clients open files with read/write access even though they never write to the file. While it makes sense to do this, it also tends to break oplocks for other clients even though neither client intends to write to the file.
Figure 5: Level II oplock
Figure 5: Level II oplock


The Level II oplock protocol is shown in Figure 5. This sequence of events is very much like an exclusive oplock. The basic difference is that the server informs the client that it should break to a Level II lock when no one has been writing the file. Client A, for example, may have opened the file for a desired access of read and a share access of read/write. This means, by definition, that client A will not perform any writes to the file.
When client B opens the file, the server must synchronize with client A in case client A has any buffered locks. Once it is synchronized, client B's open request may be completed. Client B, however, is informed that it has a Level II oplock rather than an exclusive oplock.
In this case, no client that has the file open with a Level II oplock may buffer any lock information on the local client machine. This allows the server to guarantee that if any write operation is performed, it need only notify the Level II clients that the lock should be broken without having to synchronize all of the accessors of the file.
The Level II oplock may be broken and set to none, meaning that some client that opened the file performed a write operation to the file. Because no Level II client may buffer lock information, the server is in a consistent state. The writing client, for example, could not have written to a locked range by definition. Read-ahead data may be buffered in the client machines, however, thereby cutting down on the amount of network traffic to the file. Once the Level II oplock is broken, the buffering client must discard its buffers and degrade to performing all operations on the file across the network. No oplock break response is expected from a client when the server breaks a client from Level II to none.


Security Model

Each server makes a set of resources available to clients on the network. A shared resource may be a directory tree, a named pipe, or a printer. As far as clients are concerned, the server has no storage or service dependencies on any other servers; a client considers the server to be the sole provider of the file (or other resource) being accessed.
The SMB protocol requires server authentication of users before file accesses are allowed, and each server authenticates its own users. A client system must send authentication information to the server before the server will allow access.
The SMB protocol defines two methods which can be selected by the server for security: share level and user level. A share-level server makes some directory on a disk device (or other resource) available. An optional password may be required to gain access. Thus, any user on the network who knows the name of the server, the name of the resource, and the password has access to the resource. Share-level security servers may use different passwords for the same shared resource with different passwords allowing different levels of access.
A user-level server makes some directory on a disk device (or other resource) available, but also requires the client to provide a username and corresponding password to gain access. User-level servers are preferred over share-level servers for any new server implementation, since organizations generally find user-level servers easier to administer as employees come and go. User-level servers may use the account name to check access-control lists on individual files, or may have one access control list that applies to all files in the directory.
When a user-level server validates the username and password presented by the client, an identifier representing that authenticated instance of the user is returned to the client in the Uid field of the response SMB. This Uid must be included in all further requests made on behalf of the user from that client. A share-level server returns no useful information in the Uid field.
The user-level security model was added after the original dialect of the SMB protocol was issued, and subsequently some clients may not be capable of sending usernames and passwords to the server. A server in user-level security mode communicating with one of these clients may decide to permit a client to connect to resources even if the client has not sent user name information; for example, by deriving a user name as follows: if the client's computer name is identical to a username known on the server, and if the password supplied to connect to the shared resource matches the password for that username, an implicit user logon may be performed using those values. If this fails, the server may fail the request or assign a default account name of its choice (a so-called "guest account").
The value of Uid in subsequent requests by the client will be ignored and all access will be validated assuming the username selected. Servers built to CIFS specifications should operate in user mode.


Authentication

An SMB server keeps an encrypted form of a client's password. To gain authenticated access to server resources, the server sends a challenge to the client, which the client responds to in a way that proves it knows the client's password.
Authentication makes use of DES encryption in block mode. We denote the DES encryption function as E(K,D), which accepts a seven-byte key (K) and an eight-byte data block (D) and produces an eight-byte encrypted data block as its value. If the data to be encrypted is longer than eight bytes, the encryption function is applied to each block of eight bytes in sequence and the results are appended together. If the key is longer than seven bytes, the data is first completely encrypted using the first seven bytes of the key, then the second seven bytes, and so on, appending the results each time. To encrypt the 16-byte quantity D0D1 with the 14-byte key K0K1, the function would appear as

  E(K0K1,D0D1) = E(K0,D0)E(K0,D1)E(K1,D0)E(K1,D1)
The EncryptionKey field in the SMB_COM_NEGPROT response contains an eight-byte challenge denoted below as C8, chosen to be unique to prevent replay attacks. The client responds with a 24-byte response denoted P24 and computed as described below. (The name EncryptionKey is historical—it doesn't actually hold an encryption key.)
Clients send the response to the challenge in the SMB_COM_TREE_CONNECT, SMB_COM_TREE_ CONNECT_ANDX, and one or more of the SMB_COM_ SESSION_SETUP_ANDX requests, which follows the SMB_COM_NEGPROT message exchange. The server must validate the response by performing the same computations the client did to create it, and ensuring the strings match. If the comparisons fail, the client system may be incapable of encryption. If so the string may be the user password in clear text. The server should try to validate the string as though it was the unencrypted password.


File Names

File names in the SMB protocol consist of components separated by a backslash. Early clients of the SMB protocol required that the name components adhere to an 8.3 naming format. These names consist of two parts: a base name of no more than eight characters, and an extension of no more than three characters. The base name and extension are separated by a period. All characters are legal in the base name and extension except the space character (0x20) and " . / /[]:+|<>=;,*?
If the client has indicated long-name support by setting a flag in the SMB header, the client is not bound by the 8.3 convention. Specifically, this indicates that any SMB returning file names to the client may return names that do not adhere to the 8.3 convention. In addition, these names may have a total length of up to 255 characters. This capability was introduced with the LM1.2X002 protocol dialect.


Wildcards

Some SMB requests allow wildcards to be given for the file name. If the client is using 8.3 names, each part of the name (base or extension) is treated separately. For long file names, the period in the name is significant even though there is no longer a restriction on the size of the components.
The ? character is a wildcard for a single character, as in MS-DOS. If a file-name part commences with one or more ?s, then exactly that number of characters will be matched by the wildcards. When a file-name part has trailing ?s, then it matches the specified number of characters or less. For example, "x??" matches "xab," "xa," and "x," but not "xabc." If only ?s are present in the file-name part, then it is handled as for trailing ?s.
The * character matches an entire part of the name, as does an empty specification for that part. A part consisting of * means that the rest of the component should be filled with ? and the search should be performed with this wildcard character. For example, "*.abc" or ".abc" match any file with an extension of "abc;" searches for "*.*" or "*" or "null" match all files in a directory.
If the negotiated dialect is NT LM 0.12 or later and the client requires MS-DOS wildcard-matching semantics, Unicode wildcards should be translated according to the following rules:
Translate the ? literal to >.
Translate the . literal to "if it is followed by a ? or a *.
Translate the * literal to < if it is followed by a .
The translation can be performed in-place.


DFS Path Names

A Distributed File System (DFS) path name adheres to the standard described in the File Names section. A DFS-enabled client accessing a DFS share sets a flag in all name-based SMB headers, indicating to the server that the enclosed path name should be resolved in the DFS namespace. The path name should always have the full file name, including the server name and share name. If the server can resolve the DFS name to local storage, the local storage will be accessed.
If the server determines that the DFS name actually maps to a different server share, access will fail with the distinguished error STATUS_PATH_NOT_COVERED (SMB status code 0xC0000257). On receiving this error, the DFS-enabled client should ask the server for a referral. The response to the referral request will contain a list of server and share names to try and the part of the request file name that links to the list of server shares. If the ServerType field of the referral is set to one (SMB server), then the client should resubmit the request with the original file name to one of the server shares in the list, once again setting the Flags2 bit 12 bit in the SMB. If the ServerType field is not one, then the client should strip off the part of the file name that links to the server share before resubmitting the request to one of servers in the list.
A referral request may elicit a response that does not have the StorageServers bit set. In that case, the client should resubmit the referral request to servers in the list until it obtains a referral response that has the StorageServers bit set, at which point the client can resubmit the request SMB to one of the listed server shares.
If, after getting a referral with the StorageServers bit set and resubmitting the request to one of the server shares in the list, the server fails the request with STATUS_PATH_ NOT_COVERED, there is an inconsistency between the view of the DFS namespace held by the server granting the referral and the server listed in that referral. In this case, the client may inform the server granting the referral of this inconsistency via the TRANS2_REPORT_DFS_INCONSISTENCY SMB.


Message Sending

Before two machines can start communicating with SMBs, they must negotiate the dialect of CIFS to use. The base protocol is called PC NETWORK PROGRAM 1.0. The LANMAN 1.0 dialect adds more operational messages. There are a few other dialects, culminating in NT LM 0.12, which supports the most operations. We'll limit discussion here to the default protocol, which recognizes 28 separate message-based file operations (see Figure 6). These messages are a superset of the abbreviated session illustrated previously. Each of these messages is followed by a different data block. Now let's look at an example. Let's examine how to search for a file on a server.


Searching for a Server File

Before a search message can be sent to the server, we're assuming that the low-level connection has been made and the appropriate dialect has been negotiated between machines. First, the client sends an SMB_COM_SEARCH message to the server. This is followed by the data block shown in Figure 7. FileName specifies the file to be sought. SearchAttributes indicates the attributes that the file must have as a bitmask. If SearchAttributes is zero, then only normal files are returned. If the system file, hidden, or directory attributes are specified, then the search is inclusive—both the specified types of files and normal files are returned. If the volume label attribute is specified, then the search is exclusive and only the volume label entry is returned. MaxCount specifies the number of directory entries to be returned.
The server responds with the block shown in Figure 8. The response will contain one or more directory entries as determined by the Count field. No more than MaxCount entries will be returned. Only entries that match the requested FileName and SearchAttributes combination will be returned.
ResumeKey must be null (that is, length=0) on the initial search request. Subsequent search requests intended to continue a search must contain the ResumeKey field extracted from the last directory entry of the previous response. ResumeKey is self-contained; on calls containing a nonzero ResumeKey, neither the SearchAttributes nor FileName fields will be valid in the request. The ResumeKey format is shown in Figure 9. FileName is 8.3 format, with the three-character extension left-justified into FileName[9-11]. If the client supports a dialect prior to LANMAN 1.0, the returned FileName should be uppercase.
SMB_COM_SEARCH terminates when either the requested maximum number of entries that match the named file are found or the end of directory is reached without the maximum number of matches being found. A response containing no entries indicates that no matching entries were found between the starting point of the search and the end of directory.
There may be multiple matching entries in response to a single request as SMB_COM_SEARCH supports wildcards in the last component of FileName of the initial request. Returned directory entries in the DirectoryInformationData field are formatted as shown in Figure 10. Again, FileName must conform to 8.3 rules, and is padded after the extension with 0x20 characters if necessary. If the client has negotiated a dialect prior to the LANMAN 1.0 dialect, or if bit0 of the Flags2 SMB header field of the request is clear, the returned FileName should be uppercase.
Figure 11: Searching with CIFS
Figure 11: Searching with CIFS


As can be seen from this structure, SMB_COM_SEARCH cannot return long file names, and cannot return UNICODE file names. Files larger than 232 bytes should have the least significant 32 bits of their size returned in FileSize. Figure 11 shows an overview of the entire process.


Conclusion

By using CIFS to communicate between machines, clients and servers of various types can share files and printing functions in a generic, extensible way. CIFS supplies a rich set of messages, security features, high performance, and file-safety specifications (so that multiple machines can access the same file without locking problems). It has already attracted the support of much of the industry, and is already available on a variety of platforms.

From the November 1996 issue of Microsoft Interactive Developer. Get it at your local newsstand, or better yet, subscribe.



©Microsoft Corporation. All rights reserved; reproduction in part
or in whole without permission is expressly prohibited.

©2004 Microsoft Corporation. All rights reserved.  Terms of Use | Trademarks | Privacy Statement
Microsoft
<script language=javascript>var msviFooter2;if (document.getElementById){msviFooter2 = document.getElementById("msviFooter2");msviFooter2.style.filter = "";}</script>
<script language=javascript>footerjs(document);</script>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值