Standard I/O Library
标准IO库作为各种操作系统上都有实现的库,了解标准IO库以及其在Unix系统上的具体实现对进一步学习Unix有很大帮助。
涉及了所有由liabrary提供的函数(function), 并涉及到一部分实现细节以及效率方面的思考。此外介绍了buffering(标准IO的缓冲方式)。
对所有知识点以提问的形式进行了总结,链接如下:http://blog.csdn.net/feather_wch/article/details/50722377
5-1 Introduction
在本章节,我们讲述标准IO库的相关知识。该库是由ISO C标准指定的,因为其已经被实现在各种操作系统上,而不仅仅局限于Unix系统。此外相对于ISO C标准额外补充的接口是由单一Uuix规范定义的。
In this chapter, we describe the standard I/O library. This library is specified by the ISO C standard because it has been implemented on many operating systems other than the UNIX System. Additional interfaces are defined as extensions to the ISO C standard by the Single UNIX Specification.(标准IO库由ISO C标准规定。作为额外补充的接口是单一Uuix规范规定的)
The standard I/O library handles such details as buffer allocation and performing I/O in optimal-sized chunks, obviating(消除) our need to worry about using the correct block size (as in Section 3.9). This makes the library easy to use, but at the same time introduces another set of problems if we’re not cognizant of what’s going on.(使用标准IO我们不需要使用正确的块大小)
The standard I/O library was written by Dennis Ritchie around 1975. It was a major revision of the Portable I/O library written by Mike Lesk. Surprisingly little has changed in the standard I/O library after more than 35 years.
5-2 Streams and FILE Objects
In Chapter 3, all the I/O routines centered on file descriptors. (第三章节文件IO部分,IO操作都是以文件描述符为中心) .标准IO库以流为中心(With the standard I/O library, the discussion centers on streams).(Do not confuse the standard I/O term stream with the STREAMS I/O system that is part of System V and was standardized in the XSI STREAMS option in the Single UNIX Specification, but is now marked obsolescent(荒废的) in SUSv4.) When we open or create a file with the standard I/O library, we say that we have associated a stream with the file.(当我们使用标准IO打开一个文件,也就是将一个流与文件关系到了一起)
Standard I/O file streams can be used with both single-byte and multibyte (‘‘wide’’) character sets.A stream’s orientation(倾向性) determines whether the characters that are read and written are single byte or multibyte.
(宽字节IO)If a multibyte I/O function (see wchar.h) is used on a stream without orientation, the stream’s orientation is set to wide oriented.
(单字节IO)If a byte I/O function is used on a stream without orientation, the stream’s orientation is set to byte oriented.
仅仅有两个函数能改变oirentation once set
:
- freopen 将会清除stream’s orientation
- fwide 可以设置stream’s orientation
fwide
链接:http://blog.csdn.net/feather_wch/article/details/50684244
《APUE》的剩余部分都是围绕byte oriented streams
。
当我们使用fopen
打开文件的时候,会返回FILE
对象的指针。
FILE
结构包含了所有标准IO库处理stream
所需的数据:
1. 实际IO的文件描述符
2. 用于流的buffer(缓冲区)的指针(pointer)
3. buffer的尺寸
4. 当前buffer中字符的总数
5. an error flag
6. the like
软件永远不需要检查FILE
。我们一般将FILE
object的指针(FILE *)称为file pointer
5-3 Standard Input, Standard Output, and Standard Error
Three streams are predefined and automatically available to a process: standard input, standard output, and standard error. These streams refer to the same files as the file descriptors STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO, respectively, which we mentioned in Section 3.2.
These three standard I/O streams are referenced through the predefined file pointers stdin, stdout, and stderr. The file pointers are defined in the stdio.h header.
5-4 Buffering
标准IO库使用buffering
目的就是尽可能少使用read
和write
.Also, this library tries to do its buffering automatically for each I/O stream, obviating(消除) the need for the application to worry about it. 标准IO库中产生困惑最多的部分就是buffering
.
标准IO库提供的三种类型的buffering:
- Fully buffered(全缓冲). In this case, actual I/O takes place when the standard I/O buffer is filled. 磁盘上文件是通过标准IO库进行的全缓冲.
buffer
是第一次操作stream时由标准IO调用malloc
获得的。
The term flush describes the writing of a standard I/O buffer. A buffer can be flushed automatically by the standard I/O routines, such as when a buffer fills, or we can call the function fflush toflush a stream
. Unfortunately, Unix环境下flush
有两种意思.
- In terms of the standard I/O library, it means writing out the contents of a buffer, which may be partially filled.
- In terms of the terminal driver, such as the
tcflush
function in Chapter 18, it means to discard the data that’s already stored in a buffer
- Line buffered(行缓冲).In this case, the standard I/O library performs I/O when a newline character is encountered on input or output. This allows us to output a single character at a time (with the standard I/O
fputc
function), knowing thatactual I/O
will take place only whenwe finish writing each line
.Line buffering
is typically used on a stream when it refers to a terminal—standard input and standard output, for example.
行缓冲会遇到两个警告:
- First, the size of the
buffer
that the standard I/O library uses to collect each line is fixed, so I/O might take place if we fill this buffer before writing a newline. - Second, whenever input is requested through the standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream (that requires data to be requested from the kernel), all line-buffered output streams are flushed.(line-buffered如果数据在缓冲区中,则不需要从kernel请求数据。
unbuffered
则是一定会从kernel获得数据的)
- First, the size of the
- Unbuffered(无缓冲).The standard I/O library does not buffer the characters.使用
fpuc
会尽可能快地write数据。The standard error stream, for example, is normally unbuffered so that any error messages are displayed as quickly as possible, regardless of whether they contain a newline.
ISO C requires the following buffering characteristics:
- Standard input and standard output are fully buffered, if and only if they do not refer to an interactive device.
- Standard error is never fully buffered.
对于三个标准流,大多数规范采用默认的做法:
1. 标准error总是unbuffered
2. 标准input/output 在 terminal device
的时候是line buffered
;否则是fully buffered
setbuf、setvbuf
如果我们不满意stream默认的buffering,可以使用该调用来更改。
链接:http://blog.csdn.net/feather_wch/article/details/50684175
通常我们让系统自己选择buffer
尺寸和分配buffer
,这样在我们关闭stream
的时候,标准IO库会自动释放buffer
fflush
强制stream流进行刷新
链接:http://blog.csdn.net/feather_wch/article/details/50684789
5.5 Opening a Stream
fopen、freopen、fdopen and fclose
用于打开一个流
链接:http://blog.csdn.net/feather_wch/article/details/50696503
5.6 Reading and Writing a Stream
Once we open a stream, we can choose from among three types of unformatted I/O
:
1. Character-at-a-time I/O
. We can read
or write
one character at a time。
2.Line-at-a-time I/O.
If we want to read
orwrite
a line at a time, we use fgets and fputs.
Each line is terminated with a newline character, and we have to specify the maximum line length that we can handle when we call fgets.
3. Direct I/O
(binary I/O). This type of I/O is supported by the fread
and fwrite
functions. These two functions are often used for binary files
where we read or write a structure with each operation.
Input Functions:getc、fgetc、getchar
输入字符
链接:http://blog.csdn.net/feather_wch/article/details/50696710
Output Functions: putc、fputc、putchar
用于输出字符
链接:http://blog.csdn.net/feather_wch/article/details/50697914
5-7 Line-at-a-Time I/O
以行为单位的输入,可以使用两个函数完成:fgets
和gets
输出使用fputs
和puts
链接:http://write.blog.csdn.net/mdeditor
5-8 Standard I/O Efficiency
进行文件的复制,可以使用很多种办法,如使用read/write
,getc/putc
,fgets/fputs
.
比较效率我们可以发现read/write
的效率在BUFFSIZE值合适的时候是最高的。fgets/fputs
次高比最低的putc/getc
高很多
如果
fgets
使用getc
实现的,那么两者效率一致
有时候,Line-at-a-Time functions
是使用memccpy
实现的,而memccpy
经常是使用汇编实现的(为了效率)
但是,如果BUFFSIZE的值为1,那么标准IO的效率明显比read/write
高,因为标准IO进行系统调用的次数更少。
此外,标准IO库的速度并不比直接的系统调用
read/write
慢很多
标准IO的优点
主要优点之一有我们不需要关注buffering
或者合适的IOsize
的选择。
5-9 Binary I/O
如果我们进行二进制IO,经常需要read
orwrite
整个结构体。我们可以使用getc
orputc
等一个字节一个字节遍历结构体。但是我们不能使用 以行为单位 进行IO的functions,因为结构体中可能有值等同于换行符的内容,这会导致错误。
对于二进制IO,我们使用fread
和fwrite
链接:http://blog.csdn.net/feather_wch/article/details/50698400
5-10 Positioning a Stream
There are three ways to position a standard I/O stream
1. ftell and fseek
2. ftello and fseeko
3. fgetpos and fsetpos
详细讲解链接:http://blog.csdn.net/feather_wch/article/details/50698592
5-11 Formatted I/O
Formatted Output(格式化输出)-printf、vprintf等
链接:http://blog.csdn.net/feather_wch/article/details/50709141
Formatted Input-scanf、vscanf等
链接:http://blog.csdn.net/feather_wch/article/details/50709678
5-12 Implementation Details
正如我们提到的,在Unix系统下,标准IO库止步到调用我们第三章讲解的IO程序(read, write等等)。每个标准IO流都有相应的文件描述符,我们可以获得流的文件描述符,通过调用fileno
As we’ve mentioned, under the UNIX System, the standard I/O library ends up calling the I/O routines that we described in Chapter 3. Each standard I/O stream has an associated file descriptor, and we can obtain the descriptor for a stream by calling
fileno
#include <stdio.h>
int fileno(FILE *stream);
//Returns: the file descriptor associated with the stream
我们在需要调用dup
或者fcntl
函数的时候, 会用到fileno
这个函数。
dup
(复制文件描述符:http://blog.csdn.net/feather_wch/article/details/50647093)
fcntl
(改变已经打开文件的属性:http://blog.csdn.net/feather_wch/article/details/50646906)
为查看标准IO库的实现,可以先从stdio.h
开始,其中会显示出FILE
是如何定义的,以及per-stream flags
的定义,和任何标准IO程序,例如getc
(用宏定义的)
(To look at the implementation of the standard I/O library on your system, start with the header stdio.h
. This will show how the FILE object is defined, the definitions of the per-stream flags
, and any standard I/O routines, such as getc, that are defined as macros. )
Section 8.5 of
Kernighan
and Ritchie [1988] has a sample implementation that shows the flavor ofmany implementations on UNIX systems
. Chapter 12 of Plauger [1992] provides the complete source code for an implementation of thestandard I/O library
. The implementation of theGNU standard I/O library
is also publicly available.
Example:
显示出三个标准流和一个一般文件流的buffering。
例程如下(进过测试):
#include <stdio.h>
void pt_stdio(const char * name, FILE * fp);//显示buffering方式
int main()
{
FILE * fp;
/*对标准流和文件流进行过第一次操作后,内核才会给其分配buffer*/
fputs("entry any charcater\n", stdout);
if(getchar() == EOF)
{
fprintf(stderr, "getchar error\n");
}
fputs("one line to error\n", stderr);
/*显示其buffer*/
pt_stdio("stdin", stdin);
pt_stdio("stdout", stdout);
pt_stdio("stderr", stderr);
/*打开一般文件*/
if((fp = fopen("/etc/networks", "r")) == NULL)
{
fprintf(stderr, "fopen error\n");
exit(-1);
}
/*进行一次操作*/
if(getc(fp) == EOF)
{
fprintf(stderr, "getc error\n");
exit(-1);
}
/*显示buffer类型*/
pt_stdio("/etc/networks", fp);
exit(0);
}
void pt_stdio(const char * name, FILE * fp)
{
printf("stream = %s", name);
/*
* 如下代码没有可移植性
*/
if(fp->_IO_file_flags & _IO_UNBUFFERED) //无缓冲
{
printf(" unbuffered ");
}
else if(fp->_IO_file_flags & _IO_LINE_BUF)//行缓冲
{
printf(" line buffered");
}
else //全缓冲
{
printf(" fully buffered");
}
printf(", buffer size is % d\n", fp->_IO_buf_end - fp->_IO_buf_base); //缓冲区大小
}
上面使用的_IO_file_flags
、_IO_UNBUFFERED
、_IO_LINE_BUF
、_IO_buf_end
、_IO_buf_base
都是GNU C
库在Linux上使用的,没有可移植性。
运行测试:
$./StandIO
entry any charcater
one line to error
stream = stdin line buffered, buffer size is 1024
stream = stdout line buffered, buffer size is 1024
stream = stderr unbuffered , buffer size is 1
stream = /etc/networks fully buffered, buffer size is 4096
当standard input,output,error 与终端相连接,则IO都是行缓冲.
其中buffer size为1024 bytes,这并不是只能写入1024字节的数据。比如写入2048字节的数据也是可以的,仅仅是调用了2次系统调用(write)
$./standardIO < /etc/temp > std.out 2>std.err
entry any charcater
stream = stdin fully buffered, buffer size is 4096
stream = stdout fully buffered, buffer size is 4096
stream = stderr unbuffered , buffer size is 1
stream = /etc/networks fully buffered, buffer size is 4096
需要先在/etc创建temp文件
从结果可知,标准流重定向之后,标准流stdin/stdout使用全缓冲。stderr依然是unbuffered。全缓冲的大小等于前一次IO的尺寸(st_blksize)。
一般文件默认是全缓冲
5-13 Tempory Files
创建临时文件
tmpnam和tempname都是危险的,可以使用mkstemp来代替,不要使用这两个函数!
在得到唯一的路径名和使用该路径名创建新文件之间有一窗口(间隙)存在。在该间隙的时候,另一个进程可能使用该路径名创建文件。导致意料之外的事情发生。
tmpnam - create a name for a temporary file
创建合法路径名且不是已经存在文件的名字,每次都会产生不同的路径名,共“TMP_MAX”种(定义在stdio.h中,ISO C规定最少为25,单一Unix规范规定最少为10000)
#include <stdio.h>
char *tmpnam(char *s);
//Return: poniter to unique pathname
- 如果
s
为NULL,产生的路径名保存在static area(静态区)
,并且返回其指针。后续的调用会覆盖静态区中的内容,所以我们需要先保存之前的值。 - 如果
s
非NULL。则需要确保s
指向的数组至少有L_tmpnam
个字符(定义在stdio.h)。创建的路径名保存在s
中,并且s
作为返回值返回。
tmpfile - create a temporary file
#include <stdio.h>
FILE *tmpfile(void);
//Returns: fp if OK, NULL on error
创建临时的二进制文件(type wb+)
* 在其被关闭或者程序终止的时候,临时文件会自动删除。
Example
标准的使用方法是:通过调用tmpnam
得到的唯一路径名来调用tmpfile
来创建文件,然后立即调用unlink
。从4.15节我们知道,unlink
一个文件不会立即删除该文件,在文件被close
或者程序终止的时候,该文件才会被删除。
tempnam
这个也是危险的,不应该使用了,所以不多讲
原型如下:
#include <stdio.h>
char *tempnam(const char *dir, const char *pfx);
mkstemp, mkostemp, mkstemps, mkostemps - create a unique temporary file
#include <stdlib.h>
int mkstemp(char *template);
//Returns: file descriptor if OK, -1 on error
int mkostemp(char *template, int flags);
int mkstemps(char *template, int suffixlen);
int mkostemps(char *template, int suffixlen, int flags);
mkstemp
类似于tmpfile
.
返回的文件描述符的文件已经被打开(可读可写方式),临时文件名使用template
字符串(该字符串的最后6个字符被系统设定xxxxxx)
different between tmpfile and mkstemp
mkstemp
需要我们手动采用unlink
tmpfile
自动移除
5-14 Memory Streams
标准IO库缓冲区在内存中,IO操作才会更高效。我们也知道可以通过setbuf
和setvbuf
提供自己的buffer
来给库使用。单一Unix规范增加对memory streams
的支持。这是标准IO流用于没有下面文件的,尽管它们仍然可以被FILE pointer
访问。所有IO都是通过从buffers
读入读出字节完成的。
内存流
虽然被看做文件流,一系列特征使他们更适合复制字符串。
(As we’ve seen, the standard I/O library buffers data in memory, so operations such as character-at-a-time I/O and line-at-a-time I/O are more efficient. We’ve also seen that we can provide our own buffer for the library to use by calling setbuf
or setvbuf
. In Version 4, the Single UNIX Specification added support for memory streams
. These are standard I/O streams for which there are no underlying files
, although they are still accessed with FILE pointers
. All I/O is done by transferring bytes to and from buffers in main memory. As we shall see, even though these streams look like file streams, several features make them more suited for manipulating
character strings.)
这里有用于打开memory stream
的操作,如fmemopen
等。
详细讲解链接:http://blog.csdn.net/feather_wch/article/details/50727607
5-15标准IO库替代品
标准IO库并不是完美的。其中很多涉及到数据复制的部分都是低效的。例如fgets
和fputs
,进行了两次数据复制。第一次是在内核和标准IO buffer之间,第二次是在标准IO buffer和line buffer之间。
如Fast I/O library(fio)进行了改进,将读取一行数据中在缓冲区之间的复制,改成了返回 line的指针。