glibc-文件读缓存的EOF判定坑

本文基于glibc2.24版本。问题起源于一个学弟的代码:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    int rec = dup(0);
    int fd = open("./a.txt", O_RDONLY, 0644 );
    dup2(fd, 0);
    putchar(getchar());
    dup2(rec, 0);
    putchar(getchar());
    putchar(getchar());
    putchar(getchar());
    putchar(getchar());
    putchar(getchar());
    putchar(getchar());
    close(fd);
}
``

a.txt的内容是`123456`,本来想先把0号(就是原来是stdin)fd重定向到文件a.txt,然后读出第一个字符,
然后回复到正常的stdin,并读入6个字符,结果当输入`123456`,得到结果是`1234561`而不是预期的`1123456`,
或者是`123456`然后全部eof。这么看来,应该是跟glibc的输入缓存有关,于是查看源码看看原理。

结果找了好久没找到,然后看这个哥们 http://blog.csdn.net/u012927281/article/details/51932563 也是没找到,
然后我定位找了一下IO_FILE结构体的read指针的引用,最终发现在`/libio/fileops.c`中的`_IO_new_file_underflow`:

```c
int
_IO_new_file_underflow (_IO_FILE *fp)
{
  _IO_ssize_t count;




<div class="se-preview-section-delimiter"></div>

#if 0
  /* SysV does not make this test; take it out for compatibility */
  if (fp->_flags & _IO_EOF_SEEN)
    return (EOF);




<div class="se-preview-section-delimiter"></div>

#endif

  if (fp->_flags & _IO_NO_READS)
    {
      fp->_flags |= _IO_ERR_SEEN;
      __set_errno (EBADF);
      return EOF;
    }
  if (fp->_IO_read_ptr < fp->_IO_read_end)
    return *(unsigned char *) fp->_IO_read_ptr;

  if (fp->_IO_buf_base == NULL)
    {
      /* Maybe we already have a push back pointer.  */
      if (fp->_IO_save_base != NULL)
    {
      free (fp->_IO_save_base);
      fp->_flags &= ~_IO_IN_BACKUP;
    }
      _IO_doallocbuf (fp);
    }

  /* Flush all line buffered files before reading. */
  /* FIXME This can/should be moved to genops ?? */
  if (fp->_flags & (_IO_LINE_BUF|_IO_UNBUFFERED))
    {




<div class="se-preview-section-delimiter"></div>

#if 0
      _IO_flush_all_linebuffered ();




<div class="se-preview-section-delimiter"></div>

#else
      /* We used to flush all line-buffered stream.  This really isn't
     required by any standard.  My recollection is that
     traditional Unix systems did this for stdout.  stderr better
     not be line buffered.  So we do just that here
     explicitly.  --drepper */
      _IO_acquire_lock (_IO_stdout);

      if ((_IO_stdout->_flags & (_IO_LINKED | _IO_NO_WRITES | _IO_LINE_BUF))
      == (_IO_LINKED | _IO_LINE_BUF))
    _IO_OVERFLOW (_IO_stdout, EOF);

      _IO_release_lock (_IO_stdout);




<div class="se-preview-section-delimiter"></div>

#endif
    }

  _IO_switch_to_get_mode (fp);

  /* This is very tricky. We have to adjust those
     pointers before we call _IO_SYSREAD () since
     we may longjump () out while waiting for
     input. Those pointers may be screwed up. H.J. */
  fp->_IO_read_base = fp->_IO_read_ptr = fp->_IO_buf_base;
  fp->_IO_read_end = fp->_IO_buf_base;
  fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_write_end
    = fp->_IO_buf_base;

  count = _IO_SYSREAD (fp, fp->_IO_buf_base,
               fp->_IO_buf_end - fp->_IO_buf_base);
  if (count <= 0)
    {
      if (count == 0)
    fp->_flags |= _IO_EOF_SEEN;
      else
    fp->_flags |= _IO_ERR_SEEN, count = 0;
  }
  fp->_IO_read_end += count;
  if (count == 0)
    {
      /* If a stream is read to EOF, the calling application may switch active
     handles.  As a result, our offset cache would no longer be valid, so
     unset it.  */
      fp->_offset = _IO_pos_BAD;
      return EOF;
    }
  if (fp->_offset != _IO_pos_BAD)
    _IO_pos_adjust (fp->_offset, count);
  return *(unsigned char *) fp->_IO_read_ptr;
}

比较关键的是count = _IO_SYSREAD这里,读入缓冲区,然后他只判断了count为0和负的时候,
认为count为零是EOF,count<缓冲区大小的时候居然不会标记为EOF。。。

所以当最后一次缓冲a.txt的时候,不会认为文件已经读完,然后缓冲里面的123456读完后,会再去
预读,然后发现fd还有内容(握草,你上次读的时候发现读出来的没缓冲区大,居然不觉得有问题,这次
还去读),然后就变成这个样子了。

NOTE:我觉得glibc这样是一种偷懒的行为,写代码可以方便一些,否则得再预读一位或者用offset和stat
做比较。而且应该也是为了stdin的特殊之处,没有EOF处理,又不想单独处理下stdin。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值