(standard c libraries translation )open

open, creat - open and possibly create a file or device
open, creat - 打开和可能创建一个文件或者设备

所需头文件
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

int creat(const char *pathname, mode_t mode);

Given  a  pathname  for  a  file,  open() returns a file descriptor, a small, nonnegative integer for use in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.).  The file descriptor returned by a successful call will be the lowest-numbered file descriptor not  currently  open  for  the process.
给文件一个对应的路径名,open返回一个文件描述符,一个小的,非负的整数,给后续的系统调用(read,write,lseek,fcntl等等)使用,系统调用成功返回的文件描述符是当前未打开的描述符里面最小的一个。

By  default,  the new file descriptor is set to remain open across an execve(2) (i.e., the FD_CLOEXEC file descriptor flag described in fcntl(2) is initially disabled; the O_CLOEXEC flag, described below, can be used to change this default).  The file offset is set to the beginning  of  the  file  (see lseek(2)).
默认来说,新的文件描述符通过execvp之后依然保持打开状态(例如FD_CLOEXEC这个位于fcntl的文件描述符标志用来默认关闭,O_CLOEXEC标志,下面有讲到,可以用来改变这个默认值),文件偏移位于文件头

A  call to open() creates a new open file description, an entry in the system-wide table of open files.  This entry records the file offset and the file status flags (modifiable via the fcntl(2) F_SETFL operation).  A file descriptor is a reference to one of these entries; this reference is unaffected if pathname  is  subsequently  removed  or  modified  to  refer  to a different file.  The new open file description is initially not shared with any other process, but sharing may arise via fork(2).
open调用创建一个新的文件描述符,一个针对打开文件的系统级表的条目,这个条目记录了文件的偏移量和文件的状态符(通过fcntl的F_SETFL操作来改变),文件描述符是这些条目之中一个的引用,如果路径后续被删除或者修改成了另一个不同的文件,这个引用将失效,新打开的文件描述符默认是不共享给其他进程的,但是可以通过fork来共享

The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request opening the file read-only, write-only, or read/write, respectively.
参数标志必须是如下模式之一:O_RDONLY, O_WRONLY, or O_RDWR,这些请求以只读,只写,读写的方式打开文件

In  addition,  zero  or  more  file  creation  flags  and  file status flags can be bitwise-or'd in flags.  The file creation flags are O_CREAT, O_EXCL, O_NOCTTY, and O_TRUNC.  The file status flags are all of the remaining flags listed below.  The distinction between these two groups of  flags  is  that the  file  status  flags can be retrieved and (in some cases) modified using fcntl(2).  The full list of file creation flags and file status flags is as follows:
另外,零个或者多个文件创建标志和文件状态标志可以按位或,文件创建标志是O_CREAT, O_EXCL, O_NOCTTY和O_TRUNC,文件状态标志是下面所列的,下面两组标志符的目的区别是文件状态标志符是可以通过fcntl恢复的,所有的文件创建标志符和文件状态标志符如下所示:

O_APPEND
The file is opened in append mode.  Before each write(2), the file offset is positioned at the end of the file, as if  with  lseek(2).   O_APPEND may  lead  to  corrupted files on NFS file systems if more than one process appends data to a file at once.  This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be done without a race condition.
文件以附加的方式打开,在所有的write之前,文件的偏移置于文件尾部,就像使用lseek一样,如果多个进程同时附加数据给一个文件,O_APPEND在NFS文件系统上可能会导致文件损坏,这是因为NFS不支持文件附加操作,因此kernel必须模拟这个,在没有竞争的情况下是不允许这样子做的。

O_ASYNC
Enable signal-driven I/O: generate a signal (SIGIO by default, but this can be changed via fcntl(2)) when input or  output  becomes  possible  on this  file  descriptor.   This  feature  is  only  available for terminals, pseudoterminals, sockets, and (since Linux 2.6) pipes and FIFOs.  See fcntl(2) for further details.
打开信号驱动I/O,产生一个信号(默认是SIGIO,但是这个可以在文件描述符有输入或者输出的时候通过fcntl来改变,这个特性仅适用于终端,模拟终端,套接字和(linux 2.6以后)管道,FIFO,可以通过fcntl来得到更多细节

O_CLOEXEC (Since Linux 2.6.23)
Enable the close-on-exec flag for the new file descriptor.  Specifying this flag permits a program to avoid additional  fcntl(2)  F_SETFD  operations  to  set  the  FD_CLOEXEC flag.  Additionally, use of this flag is essential in some multithreaded programs since using a separate fcntl(2) F_SETFD operation to set the FD_CLOEXEC flag does not suffice to avoid race conditions where one thread opens a file descriptor at the same  time as another thread does a fork(2) plus execve(2).
给新的文件描述符打开close-on-exec标志,指定这个标志允许程序避免使用附加的fcntl F_SETFD操作来设置FD_CLOEXEC符号,此外,在一些多线程程序中使用这个标志是有必要的,分开的使用fcntl F_SETFD操作来设置FD_CLOEXEC标志不能满足避免竞争条件,当一个线程打开一个文件描述符,同一时刻另一个线程使用fork和execve。

O_CREAT
If the file does not exist it will be created.  The owner (user ID) of the file is set to the effective user ID of the process.  The group ownership (group ID) is set either to the effective group ID of the process or to the group ID of the parent directory (depending on file system  type and mount options, and the mode of the parent directory, see the mount options bsdgroups and sysvgroups described in mount(8)).
mode  specifies  the  permissions  to  use  in case a new file is created.  This argument must be supplied when O_CREAT is specified in flags; if O_CREAT is not specified, then mode is ignored.  The effective permissions are modified by the process's umask in the usual way: The  permissions of  the  created  file  are (mode & ~umask).  Note that this mode only applies to future accesses of the newly created file; the open() call that creates a read-only file may well return a read/write file descriptor.
如果文件不存在,那么将会被创建,文件所有者(用户ID)用来设置进程的有效用户ID,组所有者用来设置进程有效组ID或者父路径的组ID(取决于文件系统的类型和挂载选项和父路径的模式,请在mount中查看bsdgroups和sysvgroups的选项)。如果文件被创建,模式指定了使用权限,如果O_CREAT这个标志被指定,这个参数必须提供,如果O_CREAT没有被指定,这个模式可以忽略,通常进程的umask可以修改有效权限,被创建文件的权限是(mode & ~umask),需要注意的是这个模式只针对将要创建的文件,open调用创建一个只读文件可能会返回一个读写文件描述符

The following symbolic constants are provided for mode:
S_IRWXU  00700 user (file owner) has read, write and execute permission
所有者有读,写,执行权限
S_IRUSR  00400 user has read permission
所有者有读权限
S_IWUSR  00200 user has write permission
所有者有写权限
S_IXUSR  00100 user has execute permission
所有者有执行权限
S_IRWXG  00070 group has read, write and execute permission
组成员有读,写,执行权限
S_IRGRP  00040 group has read permission
组成员有读权限
S_IWGRP  00020 group has write permission
组成员有写权限
S_IXGRP  00010 group has execute permission
组成员有执行权限
S_IRWXO  00007 others have read, write and execute permission
其他用户有读,写,执行权限
S_IROTH  00004 others have read permission
其他用户有读权限
S_IWOTH  00002 others have write permission
其他用户有写权限
S_IXOTH  00001 others have execute permission
其他用户有执行权限

O_DIRECT (Since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file.  In general this will degrade performance, but it is  useful  in  special  situations, such as when applications do their own caching.  File I/O is done directly to/from user space buffers.  The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata  are  transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT.  See NOTES below for further discussion. A semantically similar (but deprecated) interface for block devices is described in raw(8).
试图减少文件I/O的cache影响,通常来说这个会降低性能,但是在特殊情况下会很有用,例如应用在做自己的缓冲,文件I/O在用户空间的缓冲区来做,O_DIRECT标志可以有效的同步数据传输,但是不保证跟O_SYNC标志一样,必要的数据能够得到传输,要保证同步I/O,除了O_DIRECT之外还需要O_SYNC,针对块设备语义上类似的(但是不赞成)接口是raw

O_DIRECTORY
If  pathname  is not a directory, cause the open to fail.  This flag is Linux-specific, and was added in kernel version 2.1.126, to avoid deniaof-service problems if opendir(3) is called on a FIFO or tape device, but should not be used outside of the implementation of opendir(3).
如果路径名不是一个路径,会导致open失败,这个参数是针对linux的,在kernel2.1.126中被加入,用于避免opendir用于FIFO或者磁带设备所产生的问题,但是不能用于非opendir实现之外

O_EXCL
Ensure that this call creates the file: if this flag is specified in conjunction with O_CREAT, and pathname  already  exists,  then  open()  will fail. When  these  two  flags are specified, symbolic links are not followed: if pathname is a symbolic link, then open() fails regardless of where the symbolic link points to. In general, the behavior of O_EXCL is undefined if it is used without O_CREAT.  There is one exception: on Linux 2.6 and  later,  O_EXCL  can  be used  without  O_CREAT  if pathname refers to a block device.  If the block device is in use by the system (e.g., mounted), open() fails with the error EBUSY. On NFS, O_EXCL is only supported when using NFSv3 or later on kernel 2.6 or later.  In NFS environments where O_EXCL  support  is  not  provided, programs  that rely on it for performing locking tasks will contain a race condition.  Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same  file  system  (e.g.,  incorporating hostname  and PID), and use link(2) to make a link to the lockfile.  If link(2) returns 0, the lock is successful.  Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
确保调用能够创建文件,如果这个标志和O_CREAT一起被指定,且文件名是存在的,open会返回失败,当这两个标志被指定,符号链接不是如下:文件名是一个符号链接,open返回失败不管符号链接指向的地方,一般来说,如果没有指定O_CREAT的话,O_EXCL的行为是未定义的。这里有一个例外:在linux2.6及以后的版本,在没有指定O_CREAT的情况下,O_EXCL是可以用于块设备的,如果该块设备已经被系统所使用,那么open将会返回EBUSY失败,在NFS系统中,O_EXCL只被NFSv3及后续版本和kernel 2.6及后续版本支持,在NFS环境,不提供O_EXCL支持,程序依赖这个来锁住任务会导致竞争,便携式程序想要使用文件锁执行原子文件锁操作,需要避免信任NFS所支持的O_EXCL,会在同样的文件系统中创建一个独一无二的文件(合并的主机名和pid),使用link来创建到锁文件的链接,如果link返回0,lock是成功的,否则使用stat来检测它的连接数是否增加到2,这种情况下lock也是成功的

O_LARGEFILE
(LFS) Allow files whose sizes cannot be represented in an off_t (but can be represented in an off64_t) to  be  opened.   The  _LARGEFILE64_SOURCE macro  must  be defined (before including any header files) in order to obtain this definition.  Setting the _FILE_OFFSET_BITS feature test macro to 64 (rather than using O_LARGEFILE) is the preferred method of accessing large files on 32-bit systems (see feature_test_macros(7)).
可以允许打开大小大于off_t的文件,_LARGEFILE64_SOURCE这个宏必须要定义来获取这个flag的定义,_FILE_OFFSET_BIT特性适用于在32的系统上打开大的文件。

O_NOATIME (Since Linux 2.6.8)
Do not update the file last access time (st_atime in the inode) when the file is read(2).  This flag is intended for use by  indexing  or  backup programs,  where  its use can significantly reduce the amount of disk activity.  This flag may not be effective on all file systems.  One example is NFS, where the server maintains the access time.
在文件读操作的时候不更新文件最后接触时间(也就是inode里面的st_atime),这个标志符用来索引或者备份程序,使用这个可以有效的减小磁盘活动数量,这个标志符可能不是对所有系统都有效,一个例子就是NFS,这里由server来维护接触时间

O_NOCTTY
If pathname refers to a terminal device—see tty(4)— it will not become the process's controlling terminal even if the process does not have one.
如果路径名指向一个终端设备可见的tty,它将不会变成进程可控制的终端,即使程序没有这样一个设备

O_NOFOLLOW
If pathname is a symbolic link, then the open fails.  This is a FreeBSD extension, which was added to Linux in version 2.1.126.   Symbolic  links in earlier components of the pathname will still be followed.
如果路径名是一个符号链接,会导致open失败,这个是FreeBSD的扩展,在linux2.1.126中被加入,符号链接在早期的路径名组件中依然被遵守

O_NONBLOCK or O_NDELAY
When possible, the file is opened in nonblocking mode.  Neither the open() nor any subsequent operations on the file descriptor which is returned will cause the calling process to wait.  For the handling of FIFOs (named pipes), see also fifo(7).  For a discussion of the effect of O_NONBLOCK in conjunction with mandatory file locks and with file leases, see fcntl(2).
如果可能,这个文件将以非阻塞模式打开,open和后续针对返回的文件描述所作的任何操作都不会导致调用进程等待,处理FIFO可以见fifo,讨论O_NONBLOCK联合强制的文件锁可以参见fcntl

O_SYNC The  file  is  opened for synchronous I/O.  Any write(2)s on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware.  But see NOTES below.
文件以同步I/O的方式打开,任何针对文件描述符的写操作的结果都会阻塞调用进程直到数据写到了物理设备上。

O_TRUNC
If the file already exists and is a regular file and the open mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0. If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored.  Otherwise the effect of O_TRUNC is unspecified.
如果文件已经存在,且是一个常规的文件,文件的打开模式可写(例如是O_RDWR或者O_WRONLY),将会把文件的长度截断为0,如果文件是一个FIFO或者终端设备文件,O_TRUNC的标志符将会被忽略,否则O_TRUNC的效果是未定义的


Some of these optional flags can be altered using fcntl(2) after the file has been opened.
下面的一些可选标志符在文件打开之后能被fcntl所修改
creat() is equivalent to open() with flags equal to O_CREAT|O_WRONLY|O_TRUNC.
creat等同于使用O_CREAT|O_WRONLY|O_TRUNC标志符的open

open() and creat() return the new file descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
open和creat返回新的文件描述符,当发生错误的时候返回-1(在这种情况下,errno会被设置成适当的值)

EACCES The requested access to the file is not allowed, or search permission is denied for one of the directories in the path prefix of pathname, or the file did not exist yet and write access to the parent directory is not allowed.  (See also path_resolution(7).)
针对文件所指定的操作是不被允许的,或者当前路径的父路径的查找权限被拒绝,或者文件不存在,或者父路径不允许写操作

EEXIST pathname already exists and O_CREAT and O_EXCL were used.
路径名已经存在,且使用了O_CREAT和O_EXCL

EFAULT pathname points outside your accessible address space.
路径名所指向的地址超出了进程可访问的地址空间

EFBIG  See EOVERFLOW.
参见EOVERFLOW

EINTR  While blocked waiting to complete an open of a slow device (e.g., a FIFO; see fifo(7)), the call was interrupted by a signal  handler;  see  signal(7).
当阻塞等待打开一个慢速设备动作的完成(例如FIFO),这个进程会被信号处理函数所打断,参见signal

EISDIR pathname refers to a directory and the access requested involved writing (that is, O_WRONLY or O_RDWR is set).
路径名指向一个路径,所指定的操作涉及写操作(也就是说:O_WRONLY或者O_RDWR被设置)

ELOOP  Too many symbolic links were encountered in resolving pathname, or O_NOFOLLOW was specified but pathname was a symbolic link.
在指定路径遇到过多的符号链接,或者O_NOFOLLOW被指定,但是路径名是一个符号链接

EMFILE The process already has the maximum number of files open.
进程已经打开过多文件

ENAMETOOLONG pathname was too long.
文件名太长

ENFILE The system limit on the total number of open files has been reached.
系统已经打开过多文件

ENODEV pathname  refers  to  a  device  special  file  and no corresponding device exists.  (This is a Linux kernel bug; in this situation ENXIO must be returned.)
路径名指向一个特殊的设备文件,且没有相关的设备存在

ENOENT O_CREAT is not set and the named file does not exist.  Or, a directory component in pathname does not exist or is a dangling symbolic link.
O_CREAT没有设置,且指定的文件不存在,或者指定的路径不存在,或者是一个悬挂的符号链接

ENOMEM Insufficient kernel memory was available.
kernel内存不足

ENOSPC pathname was to be created but the device containing pathname has no room for the new file.
路径名被创建了,但是没有空间来创建路径名所指定的设备

ENOTDIR A component used as a directory in pathname is not, in fact, a directory, or O_DIRECTORY was specified and pathname was not a directory.
路径名所使用的组件实际上不是一个路径,或者在指定O_DIRECTORY的情况下路径名不是一个路径

ENXIO  O_NONBLOCK | O_WRONLY is set, the named file is a FIFO and no process has the file open for reading.  Or, the file is a device special  file  and no corresponding device exists.
O_NONBLOCK | O_WRONLY被设置了,但是所制定的文件是一个FIFO,没有进程打开或者读取这个文件,或者这个文件是一个特殊的设备文件,但是没有相关的设备存在

EOVERFLOW pathname  refers  to a regular file that is too large to be opened.  The usual scenario here is that an application compiled on a 32-bit platform without -D_FILE_OFFSET_BITS=64 tried to open a file whose size exceeds (2<<31)-1 bits; see also O_LARGEFILE above.  This is the  error  specified by POSIX.1-2001; in kernels before 2.6.24, Linux gave the error EFBIG for this case.
路径名所指定的常规文件太大以致无法打开,通常情况是程序在32位的环境下编译的,且没有使用-D_FILE_OFFSET_BITS=64来尝试打开一个大于(2<<31)-1字节的文件,具体参见O_LARGEFILE,这个在POSIX.1-2001中有说明,在kernel2.6.24之前,这个错误返回EFBIG

EPERM  The  O_NOATIME  flag was specified, but the effective user ID of the caller did not match the owner of the file and the caller was not privileged (CAP_FOWNER).
O_NOATIME被指定,但是调用者有效的用户id与文件的有效用户id不匹配,调用者没有相应的权限

EROFS  pathname refers to a file on a read-only file system and write access was requested.
指定的文件是只读的,但是被要求做写操作

ETXTBSY pathname refers to an executable image which is currently being executed and write access was requested.
指定的路径是一个正在执行中的可执行镜像,但是被要求做写操作

EWOULDBLOCK The O_NONBLOCK flag was specified, and an incompatible lease was held on the file (see fcntl(2)).
O_NONBLOCK被指定,但是这个文件有一个矛盾的契约

SVr4, 4.3BSD, POSIX.1-2001.  The O_DIRECTORY, O_NOATIME, and O_NOFOLLOW flags are Linux-specific, and one may need to define _GNU_SOURCE (before including any header files) to obtain their definitions.
O_DIRECTORY, O_NOATIME和O_NOFOLLOW标志是linux特有的,在使用的时候需要定义_GNU_SOURCE

The O_CLOEXEC flag is not specified in POSIX.1-2001, but is specified in POSIX.1-2008.
O_CLOEXEC标志在POSIX.1-2001没有被定义,在POSIX.1-2008被定义

O_DIRECT is not specified in POSIX; one has to define _GNU_SOURCE (before including any header files) to get its definition.
O_DIRECT在POSIX中没有被定义,在使用前需要加上_GNU_SOURCE

Under  Linux, the O_NONBLOCK flag indicates that one wants to open but does not necessarily have the intention to read or write.  This is typically used to open devices in order to get a file descriptor for use with ioctl(2).
在linux下面,O_NONBLOCK标志表示想要打开,但是没有必要读或者是写,经常在打开设备拿到文件描述符给ioctl使用的场景中用到

Unlike the other values that can be specified in flags, the access mode values O_RDONLY, O_WRONLY, and O_RDWR, do not specify individual bits.   Rather, they define the low order two bits of flags, and are defined respectively as 0, 1, and 2.  In other words, the combination O_RDONLY | O_WRONLY is a logical error, and certainly does not have the same meaning as O_RDWR.  Linux reserves the special, nonstandard access mode 3 (binary 11) in flags to mean: check for read and write permission on the file and return a descriptor that can't be used for reading or writing.  This nonstandard access mode is used by some Linux drivers to return a descriptor that is only to be used for device-specific ioctl(2) operations.
不像其他值可以在标志位中定义,存取方式O_RDONLY, O_WRONLY, and O_RDWR没有详细指定单独的bits,它们没有定义flag的后两个bits,它们各自定义为0,1,2。换句话说,O_RDONLY | O_WRONLY是一个逻辑错误,当然跟O_RDWR的含义不同,linux储备了特殊的,非标准的存取方式mode 3标志,意思是:检测文件的读和写权限,返回一个描述符,但是不能用作读或者写,这个非标准的存取方式一般用在linux驱动,用来返回一个描述符仅仅用在特殊驱动的ioctl操作中

The (undefined) effect of O_RDONLY | O_TRUNC varies among implementations.  On many systems the file is actually truncated.
O_RDONLY | O_TRUNC的影响取决于具体的实现,在许多系统中这个文件实际上已经被截断了

There are many infelicities in the protocol underlying NFS, affecting amongst others O_SYNC and O_NDELAY.
在NFS文件系统下,O_SYNC和O_NDELAY有许多地方做的不是很恰当

POSIX provides for three different variants of synchronized I/O, corresponding to the flags O_SYNC, O_DSYNC, and  O_RSYNC.   Currently  (2.6.31),  Linux only  implements O_SYNC, but glibc maps O_DSYNC and O_RSYNC to the same numerical value as O_SYNC.  Most Linux file systems don't actually implement the POSIX O_SYNC semantics, which require all metadata updates of a write to be on disk on returning to user space, but only the  O_DSYNC  semantics,  which require only actual file data and metadata necessary to retrieve it to be on disk by the time the system call returns.
POSIX提供三种不同变体的同步IO,对应的标志是:O_SYNC, O_DSYNC,和O_RSYNC,现阶段(2.6.31)linux仅仅实现了O_SYNC,gblic把O_DSYNC and O_RSYNC的值映射的跟O_SYNC相同,许多linux文件系统实际上并没有实现POSIX的O_SYNC的协议,O_SYNC需要所有的元数据在返回到用户空间的时候更新写操作到磁盘,但是只有O_DSYNC协议规定:在系统调用返回的时候实际上的文件数据和元数据需要恢复到磁盘


Note that open() can open device special files, but creat() cannot create them; use mknod(2) instead.
open能够打开特殊的文件,但是creat不能创建它们,使用mknod代替

On  NFS  file  systems with UID mapping enabled, open() may return a file descriptor but, for example, read(2) requests are denied with EACCES.  This is because the client performs open() by checking the permissions, but UID mapping is performed by the server upon read and write requests.
在UID映射被打开的NFS文件系统中,open可能返回文件描述符,但是read请求返回EACCESS,这是因为客户程序open()检查权限,但是server使用UID映射需要读和写操作

If the file is newly created, its st_atime, st_ctime, st_mtime fields (respectively, time of last access, time of last status change, and time  of  last modification;  see stat(2)) are set to the current time, and so are the st_ctime and st_mtime fields of the parent directory.  Otherwise, if the file is modified because of the O_TRUNC flag, its st_ctime and st_mtime fields are set to the current time.
如果文件是新建的,它的its st_atime, st_ctime, st_mtime fields(分别是:最后存储时间,最后状态改变时间和最后修改时间)被设置成当前时间,父路径的st_ctime and st_mtime被设置成同样的值,否则,如果文件被改变,因为O_TRUNC标志st_ctime和st_mtime被设置成当前时间

The O_DIRECT flag may impose alignment restrictions on the length and address of user-space buffers and the file offset of  I/Os.   In  Linux  alignment restrictions  vary  by file system and kernel version and might be absent entirely.  However there is currently no file system-independent interface for an application to discover these restrictions for a given file or file system.  Some file systems provide their own interfaces for doing so, for example the XFS_IOC_DIOINFO operation in xfsctl(3).

Under  Linux  2.4,  transfer sizes, and the alignment of the user buffer and the file offset must all be multiples of the logical block size of the file system.  Under Linux 2.6, alignment to 512-byte boundaries suffices.

O_DIRECT I/Os should never be run concurrently with the fork(2) system call, if the memory buffer is a private mapping (i.e., any mapping  created  with the  mmap(2)  MAP_PRIVATE  flag;  this includes memory allocated on the heap and statically allocated buffers).  Any such I/Os, whether submitted via an asynchronous I/O interface or from another thread in the process, should be completed before fork(2) is called.  Failure to do so  can  result  in  data corruption  and undefined behavior in parent and child processes.  This restriction does not apply when the memory buffer for the O_DIRECT I/Os was created using shmat(2) or mmap(2) with the MAP_SHARED flag.  Nor does this restriction apply when the memory buffer has been advised as MADV_DONTFORK  with madvise(2), ensuring that it will not be available to the child after fork(2).

The O_DIRECT flag was introduced in SGI IRIX, where it has alignment restrictions similar to those of Linux 2.4.  IRIX has also a fcntl(2) call to query appropriate alignments, and sizes.  FreeBSD 4.x introduced a flag of the same name, but without alignment restrictions.

O_DIRECT support was added under Linux in kernel version 2.4.10.  Older Linux kernels simply ignore this flag.  Some file systems may not implement  the flag and open() will fail with EINVAL if it is used.

Applications  should  avoid mixing O_DIRECT and normal I/O to the same file, and especially to overlapping byte regions in the same file.  Even when the file system correctly handles the coherency issues in this situation, overall I/O throughput is likely to be slower than using either mode alone.  Likewise, applications should avoid mixing mmap(2) of files with direct I/O to the same files.

The behaviour of O_DIRECT with NFS will differ from local file systems.  Older kernels, or kernels configured in certain ways, may not support this combination.  The NFS protocol does not support passing the flag to the server, so O_DIRECT I/O will only bypass the page cache on the client;  the  server may  still  cache the I/O.  The client asks the server to make the I/O synchronous to preserve the synchronous semantics of O_DIRECT.  Some servers will perform poorly under these circumstances, especially if the I/O size is small.  Some servers may also be configured to lie to clients about the I/O having  reached stable storage; this will avoid the performance penalty at some risk to data integrity in the event of server power failure.  The Linux NFS client places no alignment restrictions on O_DIRECT I/O.

In summary, O_DIRECT is a potentially powerful tool that should be used with caution.  It is recommended that applications treat use of  O_DIRECT  as  a performance option which is disabled by default.

"The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances."—Linus

Currently, it is not possible to enable signal-driven I/O by specifying O_ASYNC when calling open(); use fcntl(2) to enable this flag.

当前,不能通过O_ASYNC标志的open调用来使能信号驱动I/O,使用fcntl来使能这个标志


评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值