标准IO库《APUE》 Chapter-5

最新推荐文章于 2021-11-01 15:51:30 发布

猎羽

最新推荐文章于 2021-11-01 15:51:30 发布

阅读量806

点赞数

分类专栏： Linux 文章标签： unix 标准IO库 APUE

本文链接：https://blog.csdn.net/feather_wch/article/details/50684175

版权

Linux 专栏收录该内容

33 篇文章 1 订阅

订阅专栏

Standard I/O Library

标准IO库作为各种操作系统上都有实现的库，了解标准IO库以及其在Unix系统上的具体实现对进一步学习Unix有很大帮助。

涉及了所有由liabrary提供的函数(function), 并涉及到一部分实现细节以及效率方面的思考。此外介绍了buffering（标准IO的缓冲方式）。

对所有知识点以提问的形式进行了总结，链接如下：http://blog.csdn.net/feather_wch/article/details/50722377

5-1 Introduction

在本章节，我们讲述标准IO库的相关知识。该库是由ISO C标准指定的，因为其已经被实现在各种操作系统上，而不仅仅局限于Unix系统。此外相对于ISO C标准额外补充的接口是由单一Uuix规范定义的。

In this chapter, we describe the standard I/O library. This library is specified by the ISO C standard because it has been implemented on many operating systems other than the UNIX System. Additional interfaces are defined as extensions to the ISO C standard by the Single UNIX Specification.（标准IO库由ISO C标准规定。作为额外补充的接口是单一Uuix规范规定的）

The standard I/O library handles such details as buffer allocation and performing I/O in optimal-sized chunks, obviating（消除） our need to worry about using the correct block size (as in Section 3.9). This makes the library easy to use, but at the same time introduces another set of problems if we’re not cognizant of what’s going on.(使用标准IO我们不需要使用正确的块大小)

The standard I/O library was written by Dennis Ritchie around 1975. It was a major revision of the Portable I/O library written by Mike Lesk. Surprisingly little has changed in the standard I/O library after more than 35 years.

5-2 Streams and FILE Objects

In Chapter 3, all the I/O routines centered on file descriptors. （第三章节文件IO部分，IO操作都是以文件描述符为中心） .标准IO库以流为中心（With the standard I/O library, the discussion centers on streams）.(Do not confuse the standard I/O term stream with the STREAMS I/O system that is part of System V and was standardized in the XSI STREAMS option in the Single UNIX Specification, but is now marked obsolescent（荒废的） in SUSv4.) When we open or create a file with the standard I/O library, we say that we have associated a stream with the file.（当我们使用标准IO打开一个文件，也就是将一个流与文件关系到了一起）

Standard I/O file streams can be used with both single-byte and multibyte (‘‘wide’’) character sets.A stream’s orientation(倾向性) determines whether the characters that are read and written are single byte or multibyte.

(宽字节IO)If a multibyte I/O function (see wchar.h) is used on a stream without orientation, the stream’s orientation is set to wide oriented.
(单字节IO)If a byte I/O function is used on a stream without orientation, the stream’s orientation is set to byte oriented.

仅仅有两个函数能改变oirentation once set：

freopen 将会清除stream’s orientation
fwide 可以设置stream’s orientation

fwide

链接：http://blog.csdn.net/feather_wch/article/details/50684244

《APUE》的剩余部分都是围绕byte oriented streams。
当我们使用fopen打开文件的时候，会返回FILE对象的指针。
FILE结构包含了所有标准IO库处理stream所需的数据：
1. 实际IO的文件描述符
2. 用于流的buffer(缓冲区)的指针(pointer)
3. buffer的尺寸
4. 当前buffer中字符的总数
5. an error flag
6. the like

软件永远不需要检查FILE。我们一般将FILEobject的指针（FILE *）称为file pointer

5-3 Standard Input, Standard Output, and Standard Error

Three streams are predefined and automatically available to a process: standard input, standard output, and standard error. These streams refer to the same files as the file descriptors STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO, respectively, which we mentioned in Section 3.2.

These three standard I/O streams are referenced through the predefined file pointers stdin, stdout, and stderr. The file pointers are defined in the stdio.h header.

5-4 Buffering

标准IO库使用buffering目的就是尽可能少使用read和write.Also, this library tries to do its buffering automatically for each I/O stream, obviating(消除) the need for the application to worry about it. 标准IO库中产生困惑最多的部分就是buffering.

标准IO库提供的三种类型的buffering：

Fully buffered（全缓冲）. In this case, actual I/O takes place when the standard I/O buffer is filled. 磁盘上文件是通过标准IO库进行的全缓冲. buffer是第一次操作stream时由标准IO调用malloc获得的。
The term flush describes the writing of a standard I/O buffer. A buffer can be flushed automatically by the standard I/O routines, such as when a buffer fills, or we can call the function fflush to flush a stream. Unfortunately, Unix环境下flush有两种意思.
1. In terms of the standard I/O library, it means writing out the contents of a buffer, which may be partially filled.
2. In terms of the terminal driver, such as the tcflushfunction in Chapter 18, it means to discard the data that’s already stored in a buffer
Line buffered（行缓冲）.In this case, the standard I/O library performs I/O when a newline character is encountered on input or output. This allows us to output a single character at a time (with the standard I/Ofputc function), knowing that actual I/O will take place only when we finish writing each line. Line buffering is typically used on a stream when it refers to a terminal—standard input and standard output, for example.
行缓冲会遇到两个警告：
1. First, the size of the buffer that the standard I/O library uses to collect each line is fixed, so I/O might take place if we fill this buffer before writing a newline.
2. Second, whenever input is requested through the standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream (that requires data to be requested from the kernel), all line-buffered output streams are flushed.（line-buffered如果数据在缓冲区中，则不需要从kernel请求数据。unbuffered则是一定会从kernel获得数据的）
Unbuffered(无缓冲).The standard I/O library does not buffer the characters.使用fpuc会尽可能快地write数据。The standard error stream, for example, is normally unbuffered so that any error messages are displayed as quickly as possible, regardless of whether they contain a newline.

ISO C requires the following buffering characteristics:

Standard input and standard output are fully buffered, if and only if they do not refer to an interactive device.
Standard error is never fully buffered.

对于三个标准流，大多数规范采用默认的做法：
1. 标准error总是unbuffered
2. 标准input/output 在 terminal device的时候是line buffered；否则是fully buffered

setbuf、setvbuf

如果我们不满意stream默认的buffering，可以使用该调用来更改。
链接：http://blog.csdn.net/feather_wch/article/details/50684175

通常我们让系统自己选择buffer尺寸和分配buffer，这样在我们关闭stream的时候，标准IO库会自动释放buffer

fflush

强制stream流进行刷新
链接：http://blog.csdn.net/feather_wch/article/details/50684789

5.5 Opening a Stream

fopen、freopen、fdopen and fclose

用于打开一个流
链接：http://blog.csdn.net/feather_wch/article/details/50696503

5.6 Reading and Writing a Stream

Once we open a stream, we can choose from among three types of unformatted I/O:
1. Character-at-a-time I/O. We can read or write one character at a time。
2.Line-at-a-time I/O. If we want to read orwrite a line at a time, we use fgets and fputs.
Each line is terminated with a newline character, and we have to specify the maximum line length that we can handle when we call fgets.
3. Direct I/O（binary I/O）. This type of I/O is supported by the fread and fwritefunctions. These two functions are often used for binary files where we read or write a structure with each operation.

Input Functions：getc、fgetc、getchar

输入字符
链接：http://blog.csdn.net/feather_wch/article/details/50696710

Output Functions： putc、fputc、putchar

用于输出字符
链接：http://blog.csdn.net/feather_wch/article/details/50697914

5-7 Line-at-a-Time I/O

以行为单位的输入，可以使用两个函数完成：fgets和gets
输出使用fputs和puts
链接：http://write.blog.csdn.net/mdeditor

5-8 Standard I/O Efficiency

进行文件的复制，可以使用很多种办法，如使用read/write，getc/putc，fgets/fputs.

比较效率我们可以发现read/write的效率在BUFFSIZE值合适的时候是最高的。fgets/fputs次高比最低的putc/getc高很多

如果fgets使用getc实现的，那么两者效率一致
有时候，Line-at-a-Time functions是使用memccpy实现的，而memccpy经常是使用汇编实现的（为了效率）

但是，如果BUFFSIZE的值为1，那么标准IO的效率明显比read/write高，因为标准IO进行系统调用的次数更少。

此外，标准IO库的速度并不比直接的系统调用read/write慢很多

标准IO的优点

主要优点之一有我们不需要关注buffering或者合适的IOsize的选择。

5-9 Binary I/O

如果我们进行二进制IO，经常需要readorwrite整个结构体。我们可以使用getcorputc等一个字节一个字节遍历结构体。但是我们不能使用以行为单位进行IO的functions，因为结构体中可能有值等同于换行符的内容，这会导致错误。

对于二进制IO，我们使用fread和fwrite

链接：http://blog.csdn.net/feather_wch/article/details/50698400

5-10 Positioning a Stream

There are three ways to position a standard I/O stream
1. ftell and fseek
2. ftello and fseeko
3. fgetpos and fsetpos
详细讲解链接：http://blog.csdn.net/feather_wch/article/details/50698592

5-11 Formatted I/O

Formatted Output（格式化输出）-printf、vprintf等

链接：http://blog.csdn.net/feather_wch/article/details/50709141

Formatted Input-scanf、vscanf等

链接：http://blog.csdn.net/feather_wch/article/details/50709678

5-12 Implementation Details

正如我们提到的，在Unix系统下，标准IO库止步到调用我们第三章讲解的IO程序(read, write等等)。每个标准IO流都有相应的文件描述符，我们可以获得流的文件描述符，通过调用fileno

As we’ve mentioned, under the UNIX System, the standard I/O library ends up calling the I/O routines that we described in Chapter 3. Each standard I/O stream has an associated file descriptor, and we can obtain the descriptor for a stream by calling fileno

#include <stdio.h>
int fileno(FILE *stream);

//Returns: the file descriptor associated with the stream

我们在需要调用dup或者fcntl函数的时候，会用到fileno这个函数。

dup(复制文件描述符：http://blog.csdn.net/feather_wch/article/details/50647093)
fcntl(改变已经打开文件的属性：http://blog.csdn.net/feather_wch/article/details/50646906)

为查看标准IO库的实现，可以先从stdio.h开始，其中会显示出FILE是如何定义的，以及per-stream flags的定义，和任何标准IO程序，例如getc（用宏定义的）
（To look at the implementation of the standard I/O library on your system, start with the header stdio.h. This will show how the FILE object is defined, the definitions of the per-stream flags, and any standard I/O routines, such as getc, that are defined as macros. ）

Section 8.5 of Kernighan and Ritchie [1988] has a sample implementation that shows the flavor of many implementations on UNIX systems. Chapter 12 of Plauger [1992] provides the complete source code for an implementation of the standard I/O library. The implementation of the GNU standard I/O library is also publicly available.

Example：

显示出三个标准流和一个一般文件流的buffering。
例程如下（进过测试）：

#include <stdio.h>
void pt_stdio(const char * name, FILE * fp);//显示buffering方式

int main()
{
    FILE * fp;

 /*对标准流和文件流进行过第一次操作后，内核才会给其分配buffer*/
    fputs("entry any charcater\n", stdout);
    if(getchar() == EOF)
    {
        fprintf(stderr, "getchar error\n");
    }
    fputs("one line to error\n", stderr);
 /*显示其buffer*/
    pt_stdio("stdin", stdin);
    pt_stdio("stdout", stdout);
    pt_stdio("stderr", stderr);
/*打开一般文件*/
    if((fp = fopen("/etc/networks", "r"))  == NULL)
    {
        fprintf(stderr, "fopen error\n");
        exit(-1);
    }
/*进行一次操作*/
    if(getc(fp) == EOF)
    {
        fprintf(stderr, "getc error\n");
        exit(-1);
    }
/*显示buffer类型*/
    pt_stdio("/etc/networks", fp);

    exit(0);
}

void pt_stdio(const char * name, FILE * fp)
{
    printf("stream = %s", name);
    /*
     *   如下代码没有可移植性
     */
     if(fp->_IO_file_flags & _IO_UNBUFFERED) //无缓冲
     {
        printf(" unbuffered ");
     }
     else if(fp->_IO_file_flags & _IO_LINE_BUF)//行缓冲
     {
        printf(" line buffered");
     }
     else                               //全缓冲
     {
        printf(" fully buffered");
     }
     printf(", buffer size is % d\n", fp->_IO_buf_end - fp->_IO_buf_base); //缓冲区大小
}

上面使用的_IO_file_flags、_IO_UNBUFFERED、_IO_LINE_BUF、_IO_buf_end 、_IO_buf_base 都是GNU C库在Linux上使用的，没有可移植性。

运行测试：

$./StandIO
entry any charcater

one line to error
stream = stdin line buffered, buffer size is  1024
stream = stdout line buffered, buffer size is  1024
stream = stderr unbuffered , buffer size is  1
stream = /etc/networks fully buffered, buffer size is  4096

当standard input,output,error 与终端相连接，则IO都是行缓冲.
其中buffer size为1024 bytes，这并不是只能写入1024字节的数据。比如写入2048字节的数据也是可以的，仅仅是调用了2次系统调用(write)

$./standardIO < /etc/temp > std.out 2>std.err

entry any charcater
stream = stdin fully buffered, buffer size is  4096
stream = stdout fully buffered, buffer size is  4096
stream = stderr unbuffered , buffer size is  1
stream = /etc/networks fully buffered, buffer size is  4096

需要先在/etc创建temp文件
从结果可知，标准流重定向之后，标准流stdin/stdout使用全缓冲。stderr依然是unbuffered。全缓冲的大小等于前一次IO的尺寸（st_blksize）。
一般文件默认是全缓冲

5-13 Tempory Files

创建临时文件

tmpnam和tempname都是危险的，可以使用mkstemp来代替，不要使用这两个函数！

在得到唯一的路径名和使用该路径名创建新文件之间有一窗口(间隙)存在。在该间隙的时候，另一个进程可能使用该路径名创建文件。导致意料之外的事情发生。

tmpnam - create a name for a temporary file

创建合法路径名且不是已经存在文件的名字，每次都会产生不同的路径名，共“TMP_MAX”种（定义在stdio.h中，ISO C规定最少为25，单一Unix规范规定最少为10000）

#include <stdio.h>

char *tmpnam(char *s);
//Return: poniter to  unique pathname

如果s为NULL，产生的路径名保存在static area（静态区），并且返回其指针。后续的调用会覆盖静态区中的内容，所以我们需要先保存之前的值。
如果s非NULL。则需要确保s指向的数组至少有L_tmpnam个字符（定义在stdio.h）。创建的路径名保存在s中，并且s作为返回值返回。

tmpfile - create a temporary file

#include <stdio.h>

FILE *tmpfile(void);
//Returns: fp if OK, NULL on error

创建临时的二进制文件(type wb+)
* 在其被关闭或者程序终止的时候，临时文件会自动删除。

Example

标准的使用方法是：通过调用tmpnam得到的唯一路径名来调用tmpfile来创建文件，然后立即调用unlink。从4.15节我们知道，unlink一个文件不会立即删除该文件，在文件被close或者程序终止的时候，该文件才会被删除。

tempnam

这个也是危险的，不应该使用了，所以不多讲
原型如下：

#include <stdio.h>

char *tempnam(const char *dir, const char *pfx);

mkstemp, mkostemp, mkstemps, mkostemps - create a unique temporary file

#include <stdlib.h>

int mkstemp(char *template);
//Returns: file descriptor if OK, -1 on error
int mkostemp(char *template, int flags);
int mkstemps(char *template, int suffixlen);
int mkostemps(char *template, int suffixlen, int flags);

mkstemp类似于tmpfile.
返回的文件描述符的文件已经被打开(可读可写方式)，临时文件名使用template字符串（该字符串的最后6个字符被系统设定xxxxxx）

different between tmpfile and mkstemp

mkstemp需要我们手动采用unlink
tmpfile自动移除

5-14 Memory Streams

标准IO库缓冲区在内存中，IO操作才会更高效。我们也知道可以通过setbuf和setvbuf提供自己的buffer来给库使用。单一Unix规范增加对memory streams的支持。这是标准IO流用于没有下面文件的，尽管它们仍然可以被FILE pointer访问。所有IO都是通过从buffers读入读出字节完成的。
内存流虽然被看做文件流，一系列特征使他们更适合复制字符串。

(As we’ve seen, the standard I/O library buffers data in memory, so operations such as character-at-a-time I/O and line-at-a-time I/O are more efficient. We’ve also seen that we can provide our own buffer for the library to use by calling setbuf or setvbuf. In Version 4, the Single UNIX Specification added support for memory streams. These are standard I/O streams for which there are no underlying files, although they are still accessed with FILE pointers. All I/O is done by transferring bytes to and from buffers in main memory. As we shall see, even though these streams look like file streams, several features make them more suited for manipulating character strings.)

这里有用于打开memory stream的操作，如fmemopen等。
详细讲解链接：http://blog.csdn.net/feather_wch/article/details/50727607