3-File I\O

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1

3.1 Introduction

  • The functions in this chapter are referred to as unbuffered I/O, in contrast to the standard I/O(Chapter 5). The term unbuffered means that each read or write invokes a system call in the kernel.

3.2 File Descriptors

  • To the kernel, all open files are referred to by file descriptors. A file descriptor is a non-negative integer. When we open an existing file or create a new file, the kernel returns a file descriptor to the process. When we want to read or write a file, we identify the file with the file descriptor that was returned by open or creat as an argument to either read or write.
  • By convention, UNIX System shells associate file descriptor 0 with the standard input of a process, 1 with the standard output, and 2 with the standard error. When use, the numbers 0, 1, and 2 should be replaced with the constants STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO that are defined in

3.3 open and openat Functions

  • A file is opened or created by calling either the open function or the openat function.
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *path, int oflag, ... /* mode_t mode */ );
int openat(int fd, const char *path, int oflag, ... /* mode_t mode */ );
Both return: file descriptor if OK, −1 on error
  • The last argument is …, which is the ISO C way to specify that the number and types of the remaining arguments may vary. For above functions, the last argument is used only when a new file is being created. We’ll show how to specify mode in Section 4.5 when we describe a file’s access permissions.
  • The path parameter is the name of the file to open or create.
  • This function has a multitude of options, which are specified by the oflag argument. This argument is formed by OR together one or more of the following constants from the

3.4 creat Function

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int creat(const char *path, mode_t mode);
Returns: file descriptor opened for write-only if OK, −1 on error
  • This function is equivalent to
    open(path, O_WRONLY | O_CREAT | O_TRUNC, mode);
  • In old UNIX System, the second argument to open could be only 0, 1, or 2. There was no way to open a file that didn’t exist. So a separate system call, creat, was needed to create new files. With the O_CREAT and O_TRUNC options now provided by open, a separate creat function is no longer needed.
  • One deficiency with creat is that the file is opened only for writing. Before the new version of open was provided, if we were creating a temporary file that we wanted to write and then read back, we had to call creat, close, and then open. A better way is to use open: open(path, O_RDWR | O_CREAT | O_TRUNC, mode);

3.5 close Function

#include <unistd.h>
int close(int fd);
Returns: 0 if OK, −1 on error
  • Closing a file releases any record locks that the process may have on the file.
  • When a process terminates, all of its open files are closed automatically by the kernel. Many programs take advantage of this fact and don’t explicitly close open files.

3.6 lseek Function

  • Every open file has an associated current file offset, normally a non-negative(exists exceptions) integer that measures the number of bytes from the beginning of the file. Read and write operations normally start at the current file offset and cause the offset to be incremented by the number of bytes read or written.
  • By default, this offset is initialized to 0 when a file is opened, unless the O_APPEND option is specified.
  • An open file’s offset can be set explicitly by calling lseek.
#include <sys/types.h>
#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);
Returns: new file offset if OK, −1 on error
  • The interpretation of the offset depends on the value of the whence argument. If whence is
    1. SEEK_SET: the file’s offset is set to offset bytes from the beginning of the file.
    2. SEEK_CUR: the file’s offset is set to its current value plus the offset. The offset can be positive or negative.
    3. SEEK_END: the file’s offset is set to the size of the file plus the offset. The offset can be positive or negative.
  • Because a successful call to lseek returns the new file offset, we can seek zero bytes from the current position to determine the current offset:
off_t currpos = lseek(fd, 0, SEEK_CUR);
  • This technique can be used to determine if a file is capable of seeking. If the file descriptor refers to a pipe, FIFO or socket, lseek sets errno to ESPIPE and returns −1.
  • Prior to 3 constants, whence was specified as 0 (absolute), 1 (relative to the current offset) or 2 (relative to the end of file).

#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    if(lseek(STDIN_FILENO, 0, SEEK_CUR) == -1)
    {
        printf("Can't seek\n");
    }
    else
    {
        printf("seek OK\n");
    }

    exit(0);
}
  • Figure 3.1 tests standard input to see whether it is capable of seeking.
$ ./a.out < /etc/passwd
seek OK
$ cat < /etc/passwd | ./a.out
cannot seek
  • Because negative offsets are possible, we should compare the return value from lseek as being equal to or not equal to −1, rather than testing whether it is less than 0.
  • lseek only records the current file offset within the kernel, it does not cause any I/O to take place. This offset is then used by the next read or write operation.
  • The file’s offset can be greater than the file’s current size, in which case the next write to the file will extend the file. This is referred to as creating a hole in a file and is allowed. Any bytes in a file that have not been written are read back as 0.
  • A hole in a file isn’t required to have storage backing it on disk. Depending on the file system implementation, when you write after seeking past the end of a file, new disk blocks might be allocated to store the data, but there is no need to allocate disk blocks for the data between the old end of file and the location where you start writing.

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>  // open()
#include <unistd.h>  // write(), lseek()

#define FILE_MODE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)

char buf1[] = "abcdefghij";
char buf2[] = "ABCDEFGHIJ";

void Exit(char *string)
{
    printf("%s\n", string);
    exit(1);
}

void Write(int fd, void *ptr, size_t nbytes)
{
    if (write(fd, ptr, nbytes) != nbytes)
    {
        Exit("write error");
    }
}

int main()
{
    int fd;
    if((fd = open("hole.txt", O_WRONLY | O_CREAT | O_TRUNC, FILE_MODE)) == -1)
    {
        Exit("create file error");
    }

    Write(fd, buf1, 10);

    // Offset now = 10
    if(lseek(fd, 10000, SEEK_SET) == -1)
    {
        Exit("lseek error");
    }

    // Offset now = 10000
    Write(fd, buf2, 10);
    // Offset now = 10010

    // Create a same size file without hole:
    if((fd = open("nohole.txt", O_WRONLY | O_CREAT | O_TRUNC, FILE_MODE)) == -1)
    {
        Exit("create file error");
    }
    for(int cnt = 1; cnt <= 1001; ++cnt)
    {
        Write(fd, buf1, 10);
    }

    exit(0);
}
  • The program shown in Figure 3.2 creates a file with a hole in it.
xiang :~/Gao/Notes/OS/APUE/Codes $ ls -l hole.txt 
-rw-r--r-- 1 xiang xiang 10010  729 10:36 hole.txt
xiang :~/Gao/Notes/OS/APUE/Codes $ od -c hole.txt 
0000000   a   b   c   d   e   f   g   h   i   j  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0023420   A   B   C   D   E   F   G   H   I   J
0023432
  • We use the od(1) command to look at the contents of the file. The -c flag tells it to print the contents as characters. The unwritten bytes in the middle are read back as zero. The seven-digit number at the beginning of each line is the byte offset in octal.
  • To prove that there is a hole in the file, let’s compare the file we just created with a file of the same size, but without holes:
xiang :~/Gao/Notes/OS/APUE/Codes $ ls -ls hole.txt nohole.txt 
 8 -rw-r--r-- 1 xiang xiang 10010  729 10:36 hole.txt
12 -rw-r--r-- 1 xiang xiang 10010  729 10:36 nohole.txt
  • Although both files are the same size, the file without holes consumes 12 disk blocks, whereas the file with holes consumes only 8 blocks.
  • The Single UNIX Specification provides a way for applications to determine which environments are supported through the sysconf function (Section 2.5.4). Figure 3.3 summarizes the sysconf constants that are defined.

  • The c99 compiler requires that we use the getconf(1) command to map the desired data size model to the flags necessary to compile and link our programs. Different flags and libraries might be needed, depending on the environments supported by each platform.

  • Figure 3.4 summarizes the size in bytes of the off_t data type for the platforms covered in this book when an application doesn’t define _FILE_OFFSET_BITS, as well as the size when an application defines _FILE_OFFSET_BITS to have a value of either 32 or 64.

3.7 read Function

  • Data is read from an open file with the read function.
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t nbytes);
Returns: number of bytes read, 0 if end of file, −1 on error
  • Cases that the number of bytes actually read is less than the amount requested:
    1. When reading from a regular file, if the EOF is reached before the requested number of bytes has been read. For example, if 30 bytes remain until the end of file and we try to read 100 bytes, read returns 30. The next time we call read, it will return 0 (end of file).
    2. When reading from a terminal device, it is up to one line that is read at a time.
    3. When reading from a network. Buffering within the network may cause less than the requested amount to be returned.
    4. When reading from a pipe or FIFO. If the pipe contains fewer bytes than requested, read will return only what is available.
    5. When reading from a record-oriented device. Some record-oriented devices(magnetic tape…) can return up to a single record at a time.
    6. When interrupted by a signal and a partial amount of data has already been read.
  • The read operation starts at the file’s current offset. Before a successful return, the offset is incremented by the number of bytes actually read.

3.8 write Function

  • Data is written to an open file with the write function.
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t nbytes);
Returns: number of bytes written if OK, −1 on error
  • The return value is usually equal to the nbytes argument; otherwise, an error has occurred. A common cause for a write error is either filling up a disk or exceeding the file size limit for a given process (Section 7.11 and Exercise 10.11).
  • For a regular file, the write operation starts at the file’s current offset. If the O_APPEND option was specified when the file was opened, the file’s offset is set to the current end of file before each write operation. After a successful write, the file’s offset is incremented by the number of bytes actually written.

3.9 I/O Efficiency

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#define BUFFSIZE 4096

void Exit(char *string)
{
    printf("%s\n", string);
    exit(0);
}

int main()
{
    int n;
    char buf[BUFFSIZE];

    while((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
    {
        if(write(STDOUT_FILENO, buf, n) != n)
        {
            Exit("write error");
        }
    }

    if(n < 0)
    {
        Exit("read error");
    }

    exit(0);
}
  • Figure 3.5:
    1. All UNIX system shells provide a way to open a file for reading on standard input and to create(or rewrite) a file on standard output. This prevents the program from having to open the input and output files, and allows the user to take advantage of the shell’s I/O redirection facilities.
    2. The program uses the feature of the UNIX kernel that closes all open file descriptors in a process when that process terminates.
  • How to choose the BUFFSIZE value? Figure 3.6 shows the results for reading a 516581760-byte file, using 20 different buffer sizes.

  • The test file system was Linux-ext4 with 4,096-byte blocks. This accounts for the minimum in the system time occurring at the few timing measurements starting around a BUFFSIZE of 4,096. Increasing the buffer size beyond this limit has little positive effect.
  • Most file systems support read-ahead to improve performance. When sequential reads are detected, the system tries to read in more data than an application requests, assuming that the application will read it shortly. The effect of read-ahead in Figure 3.6: the elapsed time for buffer sizes as small as 32 bytes is as good as the elapsed time for larger buffer sizes.
  • When measuring the performance of programs that read and write files, OS will cache the file in main memory, so the successive timings will likely be better than the first. This improvement occurs because the first run causes the file to be entered into the system’s cache, and successive runs access the file from the system’s cache instead of from the disk.

3.10 File Sharing

  • The kernel uses 3 data structures to represent an open file, and the relationships among them determine the effect one process has on another with regard to file sharing.
    1. Every process has an entry in the process table. Within each process table entry is a table of open file descriptors. Associated with each file descriptor are:
      (a) The file descriptor flags (close-on-exec; refer to Figure 3.7 and Section 3.14)
      (b) A pointer to a file table entry
    2. The kernel maintains a file table for all open files. Each file table entry contains
      (a) The file status flags for the file, such as read, write; more in Section 3.14
      (b) The current file offset
      (c) A pointer to the v-node table entry for the file
    3. Each open file or device has a v-node structure that contains information about the type of file and pointers to functions that operate on the file. For most files, the v-node also contains the i-node for the file. This information is read from disk when the file is opened, so that all the pertinent information about the file is readily available. For example, the i-node contains the owner of the file, the size of the file, pointers to where the actual data blocks for the file are located on disk, and so on.
  • The v-node was invented to provide support for multiple file system types on a single computer system. Sun called this the Virtual File System and called the file system–independent portion of the i-node the v-node.
  • Linux has no v-node, uses a generic i-node structure instead. The v-node is conceptually the same as a generic i-node. Both point to an i-node structure specific to the file system.

  • Figure 3.7 shows an arrangement of these three tables for a single process that has two different files open: one file is open on standard input (fd 0), and the other is open on standard output (fd 1).
  • If two independent processes have the same file open, we could have the arrangement shown in Figure 3.8.

  • Each process that opens the file gets its own file table entry, but only a single v-node table entry is required for a given file. One reason each process gets its own file table entry is that each process has its own current offset for the file.

What happens with operations that have described?

  • write(): After each write is complete, the current file offset in the file table entry is incremented by the number of bytes written. If this causes the current file offset to exceed the current file size, the current file size in the i-node table entry is set to the current file offset.
  • open(): If a file is opened with the O_APPEND flag, a corresponding flag is set in the file status flags of the file table entry. Each time a write is performed for a file with this append flag set, the current file offset in the file table entry is first set to the current file size from the i-node table entry. This forces every write to be appended to the current end of file.
  • lseek(): If a file is positioned to its current end of file using lseek, then the current file offset in the file table entry is set to the current file size from the i-node table entry. This is different from the file was opened with the O_APPEND flag(Section 3.11 atomic operation). The lseek function modifies only the current file offset in the file table entry. No I/O takes place.
  • It is possible for more than one file descriptor entry to point to the same file table entry. This happens after the dup function in Section 3.12 or after a fork when the parent and the child share the same file table entry for each open descriptor.
  • Difference between the file descriptor flags and the file status flags: The former apply only to a single descriptor in a single process; the latter apply to all descriptors in any process that point to the given file table entry.

3.11 Atomic Operations

Appending to a File

  • Consider a single process that wants to append to the end of a file use lseek followed by write:
if (lseek(fd, 0L, 2) < 0)               /* position to EOF */
    err_sys("lseek error");
if (write(fd, buf, 100) != 100) /* and write */
    err_sys("write error");
  • Assume 2 processes, A and B, are appending to the same file. Each has opened the file without the O_APPEND flag, the same as Figure 3.8(each process has its own file table entry, but they share a single v-node table entry).
  • Process A does lseek() and this sets the current offset for the file for process A to byte offset 1500(current end of file). Then the kernel switches to process B. Process B does lseek() also to set the current offset for the file for process B to byte offset 1500. Then B calls write, which increments B’s current file offset for the file to 1,600.
  • Because the file’s size has been extended, the kernel updates the current file size in the v-node to 1600. Then the kernel switches to process A. When A calls write, the data is written starting at the current file offset for A, which is byte offset 1,500. This overwrites the data that B wrote to the file.
  • The problem is that our operation of ‘‘position to the end of file and write’’ requires two separate function calls. The solution is to have the positioning to the current end of file and the write be an atomic operation with regard to other processes.
  • Any operation that requires more than one function call cannot be atomic, as there is the possibility that the kernel might temporarily suspend the process between the two function calls.

pread and pwrite Functions

  • The Single UNIX Specification includes two functions that allow applications to seek and perform I/O atomically: pread and pwrite.
#include <unistd.h>
ssize_t pread(int fd, void *buf, size_t nbytes, off_t offset);
Returns: number of bytes read, 0 if end of file, −1 on error
ssize_t pwrite(int fd, const void *buf, size_t nbytes, off_t offset);
Returns: number of bytes written if OK, −1 on error
  • Calling pread/pwrite is equivalent to calling lseek followed by a call to read/write, with the following differences:
    1. There is no way to interrupt the two operations that occur when we call pread/pwrite.
    2. The current file offset is not updated.
  • Atomic operation might be composed of multiple steps. If the operation is performed atomically, either all the steps are performed (on success) or none are performed (on failure). It is not possible for only a subset of the steps to be performed.

3.12 dup and dup2 Functions

  • An existing file descriptor is duplicated by either of the following functions:
#include <unistd.h>
int dup(int fd);
int dup2(int fd, int fd2);
Both return: new file descriptor if OK, −1 on error
  • The new file descriptor returned by dup is guaranteed to be the lowest-numbered available file descriptor.
  • With dup2, we specify the value of the new descriptor with the fd2 argument. If fd2 is already open, it is first closed. If fd equals fd2, then dup2 returns fd2 without closing it. Otherwise, the FD_CLOEXEC file descriptor flag is cleared for fd2, so that fd2 is left open if the process calls exec.
  • The new file descriptor that is returned as the value of the functions shares the same file table entry as the fd argument, shown in Figure 3.9.

  • Assume the process executes
    newfd = dup(1);
    when it’s started and assume the next available descriptor is 3. Both descriptors share the same file status flags(read, write, append…) and the same current file offset.
  • Each descriptor has its own set of file descriptor flags. Section 3.14: the close-on-exec file descriptor flag for the new descriptor is always cleared by the dup functions.
  • Another way to duplicate a descriptor is with the fcntl function(Section 3.14).
dup(fd); = fcntl(fd, F_DUPFD, 0);
dup2(fd, fd2); = close(fd2); fcntl(fd, F_DUPFD, fd2);
  • Indeed, dup2 is not exactly the same as a close() followed by an fcntl(). Differences:
    1. dup2 is an atomic operation, whereas the alternate form involves two function calls. It is possible to have a signal catcher called between the close and the fcntl that could modify the file descriptors. The same problem could occur if a different thread changes the file descriptors.
    2. There are some errno differences between dup2 and fcntl.

3.13 sync, fsync, and fdatasync Functions

  • UNIX have a buffer cache or page cache in the kernel through which most disk I/O passes. When we write data to a file, the data is normally copied by the kernel into one of its buffers and queued for writing to disk at some later time. This is called delayed write. The kernel eventually writes all the delayed-write blocks to disk when it needs to reuse the buffer for some other disk block.
  • The sync, fsync, and fdatasync functions are provided to ensure consistency of the file system on disk with the contents of the buffer cache.
#include <unistd.h>
void sync(void);
int fsync(int fd);
int fdatasync(int fd);
Returns: 0 if OK, −1 on error
  • The sync function queues all the modified block buffers for writing and returns; it does not wait for the disk writes to take place. And it is normally called periodically (usually every 30 seconds) from a system daemon called update. This guarantees regular flushing of the kernel’s block buffers.
  • The function fsync refers only to a single file, specified by the file descriptor fd, and waits for the disk writes to complete before returning. This function is used when an application(database…) needs to be sure that the modified blocks have been written to the disk.
  • The fdatasync function is similar to fsync, but it affects only the data portions of a file. With fsync, the file’s attributes are also updated synchronously.

3.14 fcntl Function

  • The fcntl function can change the properties of a file that is already open.
#include <unistd.h>
#include <fcntl.h>
int fcntl(int fd, int cmd, ... /* int arg */ );
Returns: depends on cmd if OK (see following), −1 on error
  • The fcntl function is used for five different purposes.
    1. Duplicate an existing descriptor (cmd = F_DUPFD or F_DUPFD_CLOEXEC)
    2. Get/set file descriptor flags (cmd = F_GETFD or F_SETFD)
    3. Get/set file status flags (cmd = F_GETFL or F_SETFL)
    4. Get/set asynchronous I/O ownership (cmd = F_GETOWN or F_SETOWN)
    5. Get/set record locks (cmd = F_GETLK, F_SETLK, or F_SETLKW)
cmdMeaning
F_DUPFDDuplicate the file descriptor fd. The new file descriptor is returned as the value of the function. It is the lowest-numbered descriptor that is not open, and that is greater than or equal to the third argument (taken as an integer). The new descriptor shares the same file table entry as fd. But the new descriptor has its own set of file descriptor flags, and its FD_CLOEXEC file descriptor flag is cleared. (This means the descriptor is left open across an exec, discussed in Chapter 8.)
F_DUPFD_CLOEXECDuplicate the file descriptor and set the FD_CLOEXEC file descriptor flag associated with the new descriptor. Returns the new file descriptor.
F_GETFDReturn the file descriptor flags for fd as the value of the function. Currently, only FD_CLOEXEC file descriptor flag is defined.
F_SETFDSet the file descriptor flags for fd. The new flag value is set from the third argument (taken as an integer). Beware that some existing programs that deal with the file descriptor flags don’t use the constant FD_CLOEXEC. Instead, these programs set the flag to either 0(don’t close-on-exec, the default) or 1(do close-on-exec).
F_GETFLReturn the file status flags for fd as the value of the function. They are listed in Figure 3.10. However, the first five access-mode flags are not separate bits that can be tested and they are mutually exclusive; a file can have only one of them enabled. The first three have the values 0, 1, and 2, respectively. So, we must first use the O_ACCMODE mask to obtain the access-mode bits and then compare the result against any of the five values.
F_SETFLSet the file status flags to the value of the third argument (taken as an integer). The only flags that can be changed are O_APPEND, O_NONBLOCK, O_SYNC, O_DSYNC, O_RSYNC, O_FSYNC, and O_ASYNC.
F_GETOWNGet the process ID or process group ID currently receiving the SIGIO and SIGURG signals. Section 14.5.2.
F_SETOWNSet the process ID or process group ID to receive the SIGIO and SIGURG signals. A positive arg specifies a process ID. A negative arg implies a process group ID equal to the absolute value of arg.
  • The return value from fcntl depends on the command. All commands return −1 on an error or some other value if OK. The following four commands have special return values: F_DUPFD, F_GETFD, F_GETFL, and F_GETOWN. The first command returns the new file descriptor, the next two return the corresponding flags, and the final command returns a positive process ID or a negative process group ID.

#include <stdio.h>
#include <stdlib.h>  // exit(), atoi()
#include <unistd.h>
#include <fcntl.h>

void Exit(char *string)
{
    printf("%s\n", string);
    exit(0);
}

int main(int argc, char **argv)
{
    if(argc != 2)
    {
        Exit("usage: a.out <file_descriptor>");
    }

    int val;
    if((val = fcntl(atoi(argv[1]), F_GETFL, 0)) < 0)
    {
        Exit("fcntl error");
    }

    switch(val & O_ACCMODE)
    {
    case O_RDONLY:
        printf("read only");
        break;

    case O_WRONLY:
        printf("write only");
        break;

    case O_RDWR:
        printf("read write");
        break;

    default:
        Exit("unknown access mode");
    }

    if(val & O_APPEND)
    {
        printf(", append");
    }
    if(val & O_NONBLOCK)
    {
        printf(", nonblocking");
    }
    if(val & O_SYNC)
    {
        printf(", synchronous writes");
    }
    printf("\n");

    exit(0);
}
  • The program in Figure 3.11 takes a single command-line argument that specifies a file descriptor and prints a description of selected file flags for that descriptor.
xiang :~/Gao/Notes/OS/APUE/Codes $ gcc 3-11.c 
xiang :~/Gao/Notes/OS/APUE/Codes $ ./a.out 0 < /dev/tty
read only
xiang :~/Gao/Notes/OS/APUE/Codes $ ./a.out 1 > temp.foo
xiang :~/Gao/Notes/OS/APUE/Codes $ cat temp.foo 
write only
xiang :~/Gao/Notes/OS/APUE/Codes $ ./a.out 2 2>>temp.foo 
write only, append
xiang :~/Gao/Notes/OS/APUE/Codes $ ./a.out 5 5<>temp.foo 
read write
  • The clause 5<>temp.foo opens the file temp.foo for reading and writing on file descriptor 5.
  • When we modify either the file descriptor flags or the file status flags, we must fetch the existing flag value, modify it as desired, and then set the new flag value. We can’t simply issue an F_SETFD or an F_SETFL command, as this could turn off flag bits that were previously set.

  • Figure 3.12 shows a function that sets one or more of the file status flags for a descriptor.
  • If we change the middle statement to
    val &= ~flags; /* turn flags off */
    we have a function named clr_fl. This statement logically ANDs the one’s complement of flags with the current val.
  • If we add the line
    set_fl(STDOUT_FILENO, O_SYNC);
    to the beginning of the program in Figure 3.5, we’ll turn on the synchronous-write flag. This causes each write to wait for the data to be written to disk before returning. Normally in UNIX, a write only queues the data for writing; the actual disk write operation can take place sometime later.
  • To test whether the O_SYNC flag increases the system and clock times when the program runs, we run the program in Figure 3.5, copying 492.6 MB of data from one file on disk to another and compare with a version that does the same thing with the O_SYNC flag set.

  • The six rows in Figure 3.13 were all measured with a BUFFSIZE of 4,096 bytes.
    1. The results in Figure 3.6 were measured while reading a disk file and writing to /dev/null, so there was no disk output.
    2. The second row corresponds to reading a disk file and writing to another disk file. The system time increases when we write to a disk file, because the kernel now copies the data from our process and queues the data for writing by the disk driver. The clock time also increases when we write to a disk file.
    3. When we enable synchronous writes, the system and clock times should increase significantly. As the third row shows, the system time for writing synchronously is not much more expensive than when we used delayed writes. In this case, Linux isn’t allowing us to set the O_SYNC flag using fcntl, instead failing without returning an error (but it would have honored the flag if we were able to specify it when the file was opened).
    4. The clock time in the last three rows reflects the extra time needed to wait for all of the writes to be committed to disk. After writing a file synchronously, we expect that a call to fsync will have no effect. This case is supposed to be represented by the last row, but since the O_SYNC flag isn’t having the intended effect, the last row behaves the same way as the fifth row.

  • Figure 3.14 shows timing results for the same tests run on Mac OS X, which uses the HFS file system.
  • The times match our expectations: Synchronous writes are far more expensive than delayed writes. Adding a call to fsync at the end of the delayed writes makes little measurable difference. It is likely that the OS flushed previously written data to disk as we were writing new data to the file, so by the time that we called fsync, very little work was left to be done.
  • The need for fcntl: When program operates on a descriptor, don’t know the name of the file that was opened on that descriptor. With fcntl, we can modify the properties of a descriptor, knowing only the descriptor for the open file. Another need for fcntl: when we describe nonblocking pipes (Section 15.2), since all we have with a pipe is a descriptor.

3.15 ioctl Function

#include <sys/ioctl.h>
int ioctl(int fd, int request, ...);
Returns: −1 on error, something else if OK
  • We show only the headers required for the function itself. Normally, additional device-specific headers are required. Each device driver can define its own set of ioctl commands. The system provides generic ioctl commands for different classes of devices.

3.16 /dev/fd

  • Newer systems provide a directory named /dev/fd whose entries are files named 0, 1, 2, and so on. Opening the file /dev/fd/n is equivalent to duplicating descriptor n, assuming that descriptor n is open.
  • fd = open("/dev/fd/0", mode);
    most systems ignore the specified mode, whereas others require that it be a subset of the mode used when the referenced file was originally opened.
  • Because the previous open is equivalent to
    fd = dup(0);
    the descriptors 0 and fd share the same file table entry(Figure 3.9). For example, if descriptor 0 was opened read-only, we can only read on fd. Even if the system ignores the open mode and the call
    fd = open("/dev/fd/0", O_RDWR);
    succeeds, we still can’t write to fd.
  • The Linux implementation of /dev/fd is an exception. It maps file descriptors into symbolic links pointing to the underlying physical files. E.g.: When you open /dev/fd/0, you are really opening the file associated with your standard input. Thus the mode of the new file descriptor returned is unrelated to the mode of the /dev/fd file descriptor.
  • We can call creat with a /dev/fd pathname argument or specify O_CREAT in a call to open. Because the Linux implementation uses symbolic links to the real files, using creat on a /dev/fd file will result in the underlying file being truncated.
  • Some systems provide the pathnames /dev/stdin, /dev/stdout, and /dev/stderr. These pathnames are equivalent to /dev/fd/0, /dev/fd/1, and /dev/fd/2, respectively.
  • The main use of the /dev/fd files is from the shell. It allows programs that use pathname arguments to handle standard input and standard output in the same manner as other pathnames.
  • For example, the cat(1) program specifically looks for an input filename of - and uses it to mean standard input.
    filter file2 | cat file1 - file3 | lpr
    First, cat reads file1, then its standard input (the output of the filter program on file2), and then file3. If /dev/fd is supported, the special handling of - can be removed from cat, and we can enter
    filter file2 | cat file1 /dev/fd/0 file3 | lpr

3.17 Summary

Exercises 3.1

When reading or writing a disk file, are the functions described in this chapter really unbuffered? Explain.

  • All disk I/O goes through the kernel’s block buffers(also called the kernel’s buffer cache). Exception: I/O on a raw disk device; some systems provide a direct I/O option to allow applications to bypass the kernel buffers.
  • Since the data that we read or write is buffered by the kernel, the term unbuffered I/O refers to the lack of automatic buffering in the user process with these two functions. Each read or write invokes a single system call.

Exercises 3.3

Assume that a process executes the following three function calls:

fd1 = open(path, oflags);
fd2 = dup(fd1);
fd3 = open(path, oflags);

Draw the resulting picture, similar to Figure 3.9. Which descriptors are affected by an fcntl on fd1 with a command of F_SETFD? Which descriptors are affected by an fcntl on fd1 with a command of F_SETFL?

  • Each open() gives a new file table entry. Since both opens reference the same file, both file table entries point to the same v-node table entry. The dup() references the existing file table entry.
  • An F_SETFD on fd1 affects only the file descriptor flags for fd1, but an F_SETFL on fd1 affects the file table entry that both fd1 and fd2 point to.

Exercises 3.4

The following sequence of code has been observed in various programs:

dup2(fd, 0);
dup2(fd, 1);
dup2(fd, 2);
if (fd > 2)
    close(fd);

To see why the if test is needed, assume that fd is 1 and draw a picture of what happens to the three descriptor entries and the corresponding file table entry with each call to dup2. Then assume that fd is 3 and draw the same picture.

  • If fd is 1, then the dup2(fd, 1) returns 1 without closing file descriptor 1.(Section 3.12.) After the three calls to dup2, all three descriptors point to the same file table entry. Nothing needs to be closed.
  • If fd is 3, after the three calls to dup2, four descriptors are pointing to the same file table entry. In this case, we need to close descriptor 3.

Exercises 3.5

The Bourne shell, Bourne-again shell, and Korn shell notation
digit1>&digit2
says to redirect descriptor digit1 to the same file as descriptor digit2. What is the difference between the two commands shown below? (Hint: The shells process their command lines from left to right.)

./a.out > outfile 2>&1
./a.out 2>&1 > outfile
  • Since the shells process their command line from left to right, the command
    ./a.out > outfile 2>&1
    first sets standard output to outfile and then dups standard output onto descriptor 2 (standard error). The result is that standard output and standard error are set to the same file. Descriptors 1 and 2 both point to the same file table entry. With
    ./a.out 2>&1 > outfile
    however, the dup is executed first, causing descriptor 2 to be the terminal (assuming that the command is run interactively). Then standard output is redirected to the file outfile. The result is that descriptor 1 points to the file table entry for outfile, and descriptor 2 points to the file table entry for the terminal.

Exercises 3.6

If you open a file for read–write with the append flag, can you still read from anywhere in the file using lseek? Can you use lseek to replace existing data in the file? Write a program to verify this.

  • You can still lseek and read anywhere in the file, but a write automatically resets the file offset to the end of file before the data is written. This makes it impossible to write anywhere other than at the end of file.

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值