Linux基础IO【重定向和缓冲区】

茉莉蜜茶v

已于 2023-07-26 14:45:26 修改

阅读量224

点赞数

分类专栏： Linux 文章标签： linux IO 操作系统缓冲区

于 2023-07-21 22:12:08 首次发布

本文链接：https://blog.csdn.net/cw412524/article/details/131861006

版权

Linux 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

文章目录

Linux基础IO【重定向和缓冲区】
1. 文件描述符
2. 重定向
3. 缓冲区
4. 模拟实现C语言文件流

Linux基础IO【重定向和缓冲区】

上文讲解了Linux下的文件操作，本文来接着讲解文件描述符的概念，我们可以利用文件描述符来对标准流进行重定向，使用指定的文件流，以及缓冲区的概念，使用缓冲区进行批量化读取，来提高IO效率

1. 文件描述符

前面讲到了open()函数打开文件成功的返回值就是一个文件描述符，那它到底是什么呢？下面听我一一道来

任何进程在启动的时候都会默认打开三个文件流：标准输入、标准输出、标准错误，对应到C/C++中就是，stdin/cin，stdout/cout，stderr/cerr
这里C语言对应的三个文件语言层面上就是extern FILE* stdin、extern FILE* stdout、extern FILE* stderr

在C语言文件操作时，我们是用FILE*指针来对文件进行操作，那么OS是怎么根据不同的FILE*指针，来找到不同的文件对象的呢？

FILE类型的成员中包含了文件描述符fd，就是靠文件描述符来找到不同的FILE对象

我们到来验证一下，在/usr/include/libio.h路径下

我们通过打印来看看标准文件流和自己打开的文件流的fd

#include <stdio.h>

int main()
{
  //打开三个FILE对象
  FILE* fp1 = fopen("test1.txt", "w");
  FILE* fp2 = fopen("test2.txt", "w");
  FILE* fp3 = fopen("test3.txt", "w");
  //打印标准文件流
  printf("stdin->fd: %d\n", stdin->_fileno);
  printf("stdout->fd: %d\n", stdout->_fileno);
  printf("stderr->fd: %d\n", stderr->_fileno);
  //打印自己打开的文件流
  printf("fp1->fd: %d\n", fp1->_fileno);
  printf("fp2->fd: %d\n", fp2->_fileno);
  printf("fp3->fd: %d\n", fp3->_fileno);
  //关闭
  fclose(fp1);
  fclose(fp2);
  fclose(fp3);
  return 0;
}

这就验证了FILE中fd的存在，也说明了标准输入流、标准输出流、标准错误流，对应的文件描述符分别为0，1，2

因为默认情况下，标准流占用了前三个fd，所以我们自己新打开的文件，fd是从3开始的

1.1 深度了解文件描述符

文件描述符的产生是OS为了更高效的管理文件系统

S依照先描述，再组织的原则开始，将所有的文件都视为file对象，将它们的file*指针存入指针数组fd_array[]中进行统一高效的管理，数组的下标就是文件描述符fd
任何进程在启动的时候都会默认打开三个文件流，将它们的file*存入fd_array数组中，对应的下标就是0，1，2，再新打开的文件的file*对象会被存入数组中未被占用的最小下标处，所以我们自己打开的文件描述符一般都是从3开始的

将文件描述符等文件属性集合起来，就构成了files_struct结构体，而他正是task_struct的成员之一

文件被打开后，不会直接加载到内存中，而是在磁盘中等待进程IO

1.2 文件描述符分配规则

分配规则也就是：在文件描述符数组当中，找到当前没有被使用的最小的一个下标，作为新的文件描述符

我们上面验证过新打开文件的文件描述符从3开始分配，如果我们关闭标准输入流stdin呢？

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{
    //直接打开文件text.txt
    int fd = open("text.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    printf("text->fd: %d\n", fd);
    close(fd);

    //先关闭标准输入流，再打开文件
    close(0);   //关闭1号文件执行流
    fd = open("text.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    printf("text->fd: %d\n", fd);
    close(fd);

    return 0;
}

这也就验证了上面的分配规则

1.3 深度理解一切皆文件

怎样理解我们常说的Linux下一切皆文件这个说法呢？

驱动程序完成各个设备的输入输出读取数据等操作，各个硬件操作都不一样，文件对象中就有相关操作函数的函数指针来控制不同的设备的各种操作
当我们执行程序时，程序变成了进程，此时进程只会与文件对象进程相对应的联系，并不会管底层的各个设备的操作实现的千差万别，所以说从进程角度来看，一切皆是文件。也就是用户视角来看一切皆是文件。
对于OS来说，硬件软件都是个file对象，只要提供对应的读写方法就可以对其进行驱动，然后将它们的file*存入文件描述符数组中进行管理，所以在Linux下一切皆文件

2. 重定向

OS在进行IO操作时，是根据标准输入，标准输出，标准错误所对应的文件描述符0，1，2来进行操作的，我们将三个标准中的原文件进行替换，就能达到重定向的目的，这就是重定向的本质

三个标准流

标准输入流：从键盘中读取数据，对应键盘文件
标准输出流：将数据输出到显示器，对应显示器文件
标准错误流，将错误信息输出到显示器，对应显示器文件

2.1 重定向操作

输入重定向

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{
  close(0); //close(stdin)  
  int fd = open("file.txt", O_RDONLY);
  
  int a = 0;
  int b = 0;
  scanf("%d%d\n",&a, &b);
  printf("a = %d, b = %d\n", a, b);

  return 0;
}

首先关闭标准输入流stdin，然后打开文件file.txt，此时文件描述符数组中file.txt的下标就是0，调用scanf()函数对a和b变量赋值，此时本该从键盘读取就重定向成了从file.txt文件读取

输出重定向

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{
  close(1); //close(stdout)  
  int fd = open("file.txt", O_WRONLY | O_CREAT | O_TRUNC, 0664);

  printf("hello world!\n");
  printf("hello world!\n");
  printf("hello world!\n");
  return 0;
}

首先关闭标准输出流stdout，新打开的文件file.txt的下标就取代了1，本该打印到屏幕上的内容打印到了file.txt中

追加重定向

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{
  close(1); //close(stdout)
  int fd = open("file.txt", O_APPEND | O_WRONLY | O_CREAT, 0664);

  printf("hello world!\n");
  printf("hello world!\n");
  printf("hello world!\n");
  return 0;
}

2.2 重定向指令

通过指令完成输出重定向，将数据输出到指定文件

echo 数据 > 文件名

通过指令完成输入重定向，从文件中读取数据

cat < 文件名

通过指令完成追加重定向

echo 数据 >> 文件名

还可以在程序运行后进行重定向

#include <iostream>
using namespace std;

int main()
{
    cout << "hello cout" << endl;
    cerr << "hello cerr" << endl;
    return 0;
}

对标准输出重定向，file.txt中只会有来自标准输出流的信息

对标准输出和标准错误都重定向，将数据都打印到file.txt中

通过重定向，将标准输出，标准错误打印到两个不同的文件

2.3 重定向函数

在实际开发中，进行重定向操作时，往往使用函数dup2来执行

int dup2(int oldfd, int newfd)接口

功能：将旧的fd重定向为新的fd
oldfd参数：表示新的fd
newfd参数：表示旧的fd
重定向完成后newfd改写为oldfd，newfd是oldfd的一份拷贝，最终只是保留了oldfd下标对应的内容
返回值：成功返回新的fd，失败返回-1并适当地设置errno

这个接口的参数设计有些奇怪，大家反向记忆一下

举个例子，新开一个文件，对应的fd为3，现在想将内容输出到这个文件中，也就是实现输出重定向，使用到dup2()接口对应的操作就是dp2(3, 1)

我们可以使用重定向将正常信息和报错信息分别输出到不同的文件

#include <iostream>
#include <cstdlib>
#include <cerrno>
#include <cstring>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
using namespace std;

int main()
{
    //打开两个目标文件
    int fd1 = open("file.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    int fd2 = open("error.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);

    //重定向
    int ret = dup2(fd1, 1); //标准输入重定向到file.txt
    ret = dup2(fd2, 2);     //标准错误重定向到err.txt

    cout << "hello world!" << endl;
    cout << "hello world!" << endl;
    cout << "hello world!" << endl;

    int fd = open("test.txt", O_RDONLY); //打开不存在的文件
    if(fd == -1)
    {
        //打印报错信息
       	cerr << "open fail! errno: " << errno << endl;
        cerr << "error message: " << strerror(errno) << endl;
        exit(-1);
    }

    close(fd);

    return 0;
}

3. 缓冲区

3.1 缓冲区理解

先来举个例子帮助理解缓冲区

假如你家养了一只小狗，你在给它喂食时每次都只能往他嘴里丢入几粒狗粮，这一来喂食的时间就特别长，效率很低，于是你想了个办法，给他买了个饭碗，每次只用将狗粮倒入碗中，等它自己来进食即可，这样就大大提高了效率，而这个饭碗就是缓冲区

缓冲区就本质就是一个buffer数组，配合不同的刷新策略，起到提高IO效率的作用

众所周知CPU的计算速度是非常快的，与其相比磁盘的读取是速度是特别慢的，CPU日理万机可等不起磁盘，于是就有了缓冲区，先将数据写入到缓冲区中，再根据不同的刷新策略，将数据刷新到内核缓冲区中，供CPU使用，这样就可以起到提高效率，节省调用者时间的作用

我们来看看有无IO的情况下CPU的算力

#include <iostream>
#include <unistd.h>
#include <signal.h>
using namespace std;

int count = 0;

int main()
{
    //定一个1秒的闹钟
    alarm(1);
    while(true)
    {
        cout << count++ << endl;
    }
    return 0;
}

有IO的情况下，差不多跑了11W+次

#include <iostream>
#include <unistd.h>
#include <signal.h>
using namespace std;

int count = 0;
void Print(int signo)
{
    cout << "count: " << count << endl;
    exit(1);
}

int main()
{
    //定一个1秒的闹钟
    signal(14, Print);
    alarm(1);
    while(true) count++;
    return 0;
}

去掉IO，差不多跑了5亿+次，可见频繁的IO对CPU的计算影响是很大的，如果没有缓冲区，还要花费更多时间在IO上

常常利用缓冲区进程读取/写入操作

#include <iostream>
#include <cassert>
#include <cstdio>
#include <cstring>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
using namespace std;

int main()
{
    int fd = open("file.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);

    char buffer[256] = { 0 };   //缓冲区
    int n = read(0, buffer, sizeof(buffer));   //从标准输入读取数据到缓冲区中
    buffer[n] = '\0';

    //再将缓冲区中数据写入文件中
    write(fd, buffer, strlen(buffer));

    close(fd);
    return 0;
}

3.2 缓冲区刷新策略

缓冲区有三种刷新策略

无缓冲：不会刷新到缓冲区中，而是直接给操作系统
行缓冲：遇到\n停止刷新，一次冲刷一行
全缓冲：只有缓冲区写满了才进行刷新

一般显示器的刷新策略为行缓冲，普通文件的刷新策略为全缓冲

来观察一下

#include <iostream>
#include <unistd.h>
using namespace std;

int main()
{
    while(true)
    {
        printf("%s\n", "happy"); //如果不加'\n'就是全缓冲，缓冲区满了才会刷新
        sleep(1);
    }
    return 0;
}

每秒打印一次

小结

不发生刷新的本质是不调用系统调用，并没有刷新到文件中，而是放在了对应的FILE结构体中的缓冲区中，因此，fwrite()函数调用会非常快
可以在缓冲区中挤压多份数据，随后统一进行刷新。本质：一次IO可以IO更多的数据，提高IO效率
fflush刷新的本质就是将结构体中的缓冲区也就是用户缓冲区中的数据通过系统调用接口写入到操作系统

3.3 倒计时和进度条小程序

\r是回车，光标跳转到所在行最开始的位置，\n是换行

根据缓冲区的规则来实现两个简单的小程序

倒计时小程序

#include <unistd.h>
#include <stdio.h>

int main()
{
  int i = 0;
  for(; i >= 0; i--)
  {
    printf("%2d\r",i);
    fflush(stdout);
    sleep(1);
  }
  printf("\n");
  return 0;
}

进度条小程序

#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define SIZE 101 //100个字符，加字符串结束标志'\0'
#define STYLE '#'

void process()
{
  const char* clockWise = "|/-\\";
  char buf[SIZE];
  memset(buf, '\0', sizeof(buf));
  int i = 0;
  while(i <= 100)
  {
    printf("[%-100s][%d%%][%c]\r",buf, i, clockWise[i%4]);
    fflush(stdout); //刷新缓冲区
    buf[i++] = STYLE;
    usleep(100000); //usleep单位是微秒，这里就是0.1微秒
  }
  printf("\n");
}

int main()
{
  process();
  return 0;
}

3.4 内核级缓冲区

内核级缓冲区理解

每个file对象都有自己的缓冲区和刷新策略，在系统中还有一个内核级缓冲区，它是CPU真正进行IO的地方
进行IO时，先将普通缓冲区中的数据刷新至内核级缓冲区中，CPU 再从内核级缓冲区中取数据进行运算，然后存回内核级缓冲区中，最后再由内核级缓冲区冲刷给普通缓冲区

来看一个现象

#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main()
{
  fprintf(stdout, "hello world!\n");
  const char* message = "Are you OK?\n";
  write(1, message, strlen(message));

  fork(); //创建子进程

  return 0;
}

这里我们发现在重定向后输出了两条hello wordl！这是为什么呢？

显示器刷新策略为行缓冲，普通文件为全缓冲
直接运行程序是向显示器中打印内容，因为有\n，所以两条语句都直接进行了冲刷
fprintf使用的是C语言库中对应的缓冲区，fork创建子进程后，会再拷贝一份代码给子进程，父子进程的fprintf缓冲区中都有内容，因为普通文件是全缓冲策略，当程序运行结束后，会统一刷新，于是就是打印了两条hello wordl！

补充：系统级接口没有自己的缓冲区，会直接冲刷至内核级缓冲区中，比如上面的write是直接写入到文件缓冲区的，并不会通过C语言库缓冲区再到文件缓冲区的，所以创建子进程对write的冲刷没有任何影响

4. 模拟实现C语言文件流

C语言文件流的FILE结构体类型，包含了文件的很多属性以及文件描述符fd等，以此为基础实现了C语言文件相关操作，如fopen、fclose、fwrite、fread等，这些函数本质上都是对系统调用的封装，我们可以根据系统调用和上面的缓冲区相关知识，模拟实现一个简单的C语言文件流

实现源码

mystdio.h

#pragma once
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <malloc.h>
#include <unistd.h>
#include <assert.h>
#include <stdlib.h>

#define N 1024
#define BUFFER_NONE 0x1
#define BUFFER_LINE 0x2 //行缓冲
#define BUFFER_ALL  0x4 //全缓冲

typedef struct MY_FILE
{
  int fileDescriptor;
  char outputBuffer[N];
  int flags; //刷新方式
  int current; //写入位置
}MY_FILE;

MY_FILE* my_fopen(const char* path, const char* mode); //对应的是fopen()-c接口
size_t my_fwrite(const void* ptr, size_t size, size_t nmemb, MY_FILE* stream); //对应的是fwrite()-c接口
int my_fclose(MY_FILE* fp); //对应的是fclose()-c接口

mystdio.c

#include "mystdio.h"

int my_fflush(MY_FILE* fp)
{
  assert(fp);
  write(fp->fileDescriptor, fp->outputBuffer, fp->current);
  fp->current = 0;
  fsync(fp->fileDescriptor); //同步文件的核内状态到存储设备
  return 0;
}

MY_FILE* my_fopen(const char* path, const char* mode) //对应的是fopen()-c接口
{
  //1.辨别打开模式
  int flag = 0;
  if(strcmp(mode, "r") == 0) flag |= O_RDONLY;
  else if(strcmp(mode, "w") == 0) flag |= (O_WRONLY | O_CREAT | O_TRUNC);
  else if(strcmp(mode, "a") == 0) flag |= (O_WRONLY | O_CREAT | O_APPEND);
  // 其他方式就不列举
  
  //2.设置默认mode
  mode_t m = 0664;
  
  //3.调用open()系统调
  int fd = 0;
  if(flag & O_CREAT) fd = open(path, flag, m);
  else fd = open(path, flag);
  
  //4.调用open()失败
  if(fd < 0) return NULL;

  //5.调用open()成功返回MY_FILE结构体指针
  MY_FILE* my_file = (MY_FILE*)malloc(sizeof(MY_FILE));
  if(my_file == NULL) //调用malloc()失败
  {
    close(fd); //关闭文件防止对后续操作受影响
    return NULL;
  } 
  my_file->fileDescriptor = fd;
  my_file->flags = 0;
  my_file->flags |= BUFFER_LINE; //默认采用行刷新
  my_file->current = 0;
  memset(my_file->outputBuffer, '\0', sizeof(my_file->outputBuffer));
  return my_file;
}

size_t my_fwrite(const void* ptr, size_t size, size_t nmemb, MY_FILE* stream)
{
  //1. 缓冲区已满直接写入 
  if(stream->current == N) my_fflush(stream); 
  
  //2. 缓冲区未满，数据拷贝，更新写入位置
  size_t fill_sz = size * nmemb;
  size_t have_sz = N - stream->current;
  size_t write_sz = 0;
  if(have_sz >= fill_sz) 
  {
    memcpy(stream->outputBuffer + stream->current, ptr, fill_sz);
    stream->current += fill_sz;
    write_sz = fill_sz;
  }
  else 
  {
    memcpy(stream->outputBuffer + stream->current, ptr, have_sz);
    stream->current += have_sz;
    write_sz = have_sz;
  }
  
  //3. 刷新
  if(stream->flags & BUFFER_LINE) 
  {
    if((stream->outputBuffer[stream->current - 1]) == '\n') my_fflush(stream);
  }
  else if(stream->flags & BUFFER_ALL) 
  {
    if(stream->current == N) my_fflush(stream);
  } 
  return write_sz;
}

int my_fclose(MY_FILE* fp)
{
  assert(fp);
  //1. 刷新缓冲区
  if(fp->current > 0) my_fflush(fp); 
  
  //2. 关闭文件
  close(fp->fileDescriptor);

  //3. 释放空间
  free(fp);
  fp = NULL;
  return 0; //操作成功
}

main.c

#include "mystdio.h"

int main()
{
  MY_FILE* fp = my_fopen("log.txt", "w");
  if(fp == NULL) exit(1);
  
  const char* message = "hello linux";
  
  int cnt = 5;
  while(cnt--)
  {
    char buffer[1024];
    snprintf(buffer, sizeof(buffer), "%s, %d\n", message, cnt);
    size_t size = my_fwrite(buffer, strlen(buffer), 1, fp);
    sleep(1);
    printf("当前成功写入: %zd个字节\n", size);
  }
    
  my_fclose(fp);

  return 0;
}