本文基于glibc2.24版本。问题起源于一个学弟的代码:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
int rec = dup(0);
int fd = open("./a.txt", O_RDONLY, 0644 );
dup2(fd, 0);
putchar(getchar());
dup2(rec, 0);
putchar(getchar());
putchar(getchar());
putchar(getchar());
putchar(getchar());
putchar(getchar());
putchar(getchar());
close(fd);
}
``
a.txt的内容是`123456`,本来想先把0号(就是原来是stdin)fd重定向到文件a.txt,然后读出第一个字符,
然后回复到正常的stdin,并读入6个字符,结果当输入`123456`,得到结果是`1234561`而不是预期的`1123456`,
或者是`123456`然后全部eof。这么看来,应该是跟glibc的输入缓存有关,于是查看源码看看原理。
结果找了好久没找到,然后看这个哥们 http://blog.csdn.net/u012927281/article/details/51932563 也是没找到,
然后我定位找了一下IO_FILE结构体的read指针的引用,最终发现在`/libio/fileops.c`中的`_IO_new_file_underflow`:
```c
int
_IO_new_file_underflow (_IO_FILE *fp)
{
_IO_ssize_t count;
<div class="se-preview-section-delimiter"></div>
#if 0
/* SysV does not make this test; take it out for compatibility */
if (fp->_flags & _IO_EOF_SEEN)
return (EOF);
<div class="se-preview-section-delimiter"></div>
#endif
if (fp->_flags & _IO_NO_READS)
{
fp->_flags |= _IO_ERR_SEEN;
__set_errno (EBADF);
return EOF;
}
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr;
if (fp->_IO_buf_base == NULL)
{
/* Maybe we already have a push back pointer. */
if (fp->_IO_save_base != NULL)
{
free (fp->_IO_save_base);
fp->_flags &= ~_IO_IN_BACKUP;
}
_IO_doallocbuf (fp);
}
/* Flush all line buffered files before reading. */
/* FIXME This can/should be moved to genops ?? */
if (fp->_flags & (_IO_LINE_BUF|_IO_UNBUFFERED))
{
<div class="se-preview-section-delimiter"></div>
#if 0
_IO_flush_all_linebuffered ();
<div class="se-preview-section-delimiter"></div>
#else
/* We used to flush all line-buffered stream. This really isn't
required by any standard. My recollection is that
traditional Unix systems did this for stdout. stderr better
not be line buffered. So we do just that here
explicitly. --drepper */
_IO_acquire_lock (_IO_stdout);
if ((_IO_stdout->_flags & (_IO_LINKED | _IO_NO_WRITES | _IO_LINE_BUF))
== (_IO_LINKED | _IO_LINE_BUF))
_IO_OVERFLOW (_IO_stdout, EOF);
_IO_release_lock (_IO_stdout);
<div class="se-preview-section-delimiter"></div>
#endif
}
_IO_switch_to_get_mode (fp);
/* This is very tricky. We have to adjust those
pointers before we call _IO_SYSREAD () since
we may longjump () out while waiting for
input. Those pointers may be screwed up. H.J. */
fp->_IO_read_base = fp->_IO_read_ptr = fp->_IO_buf_base;
fp->_IO_read_end = fp->_IO_buf_base;
fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_write_end
= fp->_IO_buf_base;
count = _IO_SYSREAD (fp, fp->_IO_buf_base,
fp->_IO_buf_end - fp->_IO_buf_base);
if (count <= 0)
{
if (count == 0)
fp->_flags |= _IO_EOF_SEEN;
else
fp->_flags |= _IO_ERR_SEEN, count = 0;
}
fp->_IO_read_end += count;
if (count == 0)
{
/* If a stream is read to EOF, the calling application may switch active
handles. As a result, our offset cache would no longer be valid, so
unset it. */
fp->_offset = _IO_pos_BAD;
return EOF;
}
if (fp->_offset != _IO_pos_BAD)
_IO_pos_adjust (fp->_offset, count);
return *(unsigned char *) fp->_IO_read_ptr;
}
比较关键的是count = _IO_SYSREAD
这里,读入缓冲区,然后他只判断了count为0和负的时候,
认为count为零是EOF,count<缓冲区大小的时候居然不会标记为EOF。。。
所以当最后一次缓冲a.txt的时候,不会认为文件已经读完,然后缓冲里面的123456
读完后,会再去
预读,然后发现fd还有内容(握草,你上次读的时候发现读出来的没缓冲区大,居然不觉得有问题,这次
还去读),然后就变成这个样子了。
NOTE:我觉得glibc这样是一种偷懒的行为,写代码可以方便一些,否则得再预读一位或者用offset和stat
做比较。而且应该也是为了stdin的特殊之处,没有EOF处理,又不想单独处理下stdin。