攻破：重定向 && 缓冲区

无双@

于 2024-08-25 16:36:32 发布

阅读量867

点赞数 29

分类专栏： Linux 文章标签： linux 文件重定向缓冲区 dup2 操作系统文件系统

本文链接：https://blog.csdn.net/weixin_72917087/article/details/141531001

版权

Linux 专栏收录该内容

29 篇文章 0 订阅

订阅专栏

文章目录

前言：
认识读文件read
认识重定向&&缓冲区

前言：

从上一章开始，我们进入了文件IO的学习，认识了文件描述符是个什么，以及在操作系统内部是如何管理那么多文件的，最终我们还解释了为什么说“linux一切皆文件”这个概念。下面我们继续往下学习，认识输入输出重定向以及了解缓冲区的概念。

认识读文件read

首先我们需要知道：文件 = 属性 + 内容。
因此，对文件操作本质就是对文件的属性或文件的内容进行操作。
对于文件的内容操作，不论我们是使用系统调用接口还是说语言层面的函数，我们都可以实现对文件的内容的操作，但是对于属性我们暂时无法添加相对应的属性，但我们可以通过一个函数来实现文件属性的查找：

int stat(const char* path, struct stat* buf);

struct stat这个结构体包含了文件的各种信息，比如文件大小。而buf就是我们的输出型参数。

而对于read的操作，我们只需要先将open里的宏替换成O_RDONLY，然后再实现read的代码：

ssize read
(
	int fd, 		// 从文件描述符 fd 指定的文件中读取内容
	void* buf, 		// 将读取到的内容放到 buf 中
	size_t count	// 指明希望从文件中读取的字节数
);

所以我们便可以先利用write在log.txt里写入“hello linux file" 再通过stat获取文件的大小，最后通过read读取数据存放至buffer数组中，便可以实现文件的读取，代码如下：

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int main()
{
    struct stat st;
    int n = stat("log.txt", &st);
    
    printf("file size: %lu\n", st.st_size);

    int fd = open("log.txt", O_RDONLY);
    if(fd < 0)
    {
        perror("open");
        return 1;
    }

    char* file_buffer = (char*)malloc(st.st_size);
    if(file_buffer == NULL)
    {
        perror("malloc");
        return 2;
    }
    
    n = read(fd, file_buffer, st.st_size);
    if(n < 0)
    {
        perror("read");
        return 3;
    }

    printf("file_buffer: %s\n", file_buffer);

    return 0;
}

认识重定向&&缓冲区

重定向现象及分析：

先看代码：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int main()
{
    close(1); // 先关闭默认的stdout，也就是关闭了显示屏
    int fd = open("log.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if(fd < 0)
    {
        perror("open");
        return 1;
    }
    printf("fd: %d\n", fd);
    fprintf(stdout, "%s\n", "this is fprintf");
    
    const char* msg = "This is write\n";
    int n = write(fd, msg, strlen(msg));
    if(n < 0)
    {
        perror("write");
        return 2;
    }
    return 0;
}

问题出现了！我明明在代码中printf了我的文件描述符fd，为什么没有显示呢？
为什么我明明直接向stdout用fprintf打印了一句话，最后也没有在显示屏显示呢？
还有，为什么在问log.txt的文件中，新创建的文件描述符变成了1？1不是显示器文件吗？？？

分析现象：

因此，printf和fprintf原本是向1号对应的显示屏打印的，但是由于先关闭了一号，导致发生了文件的重定向，log.txt的文件描述符就变为了1号。因此最终其实是向log.txt中打印数据！

我们也能得知文件描述符的分配规则：
——先查自己的文件描述符表（就是上图的struct file* arrat[ ]数组），分配最小的且没有退出的文件描述符fd。

dup2的介绍：

现在我差不多知道什么叫重定向了，不就是原本是往显示器文件打印数据，变成了往普通文件打印数据吗。但是实现这个却要先close掉显示器文件，可不可以不关掉直接发生重定向呢？对此，我们就要引入一个函数：

int dup2(int oldfd, int newfd);

对于这个函数，它的本质其实是 文件描述符下标所对应的内容的拷贝，将老的文件描述符往新的文件描述符进行拷贝。
例如我想实现上述那样，将原本的3号文件描述符拷贝至1号文件描述符（当然这里一定是对内容的拷贝而不是下标！！！）我们就可以这样写：dup2(fd, 1);（意思是将fd的内容拷贝至1号文件描述符当中）

代码演示：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int main()
{
    int fd = open("log.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if(fd < 0)
    {
        perror("open");
        return 1;
    }
    
    dup2(fd, 1);
    
    // C
    printf("fd: %d\n", fd);
    fprintf(stdout, "%s\n", "this is fprintf");
    
    // system call
    const char* msg = "This is write\n";
    int n = write(fd, msg, strlen(msg));
    if(n < 0)
    {
        perror("write");
        return 2;
    }
    
    return 0;
}

这个时候我们会发现通过dup2实现的重定向，原本是3号文件描述符并不会发生改变，原因如下：

缓冲区的引入：

其实对于上述的代码，我们并没有写完全。按道理来说，在使用系统调用open打开一个文件，最后是需要close掉的，可是我们并没有close。接下来我们使用先关close(1)而实现重定向的代码来分析缓冲区：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int main()
{
    close(1); // 先关闭默认的stdout，也就是关闭了显示屏
    int fd = open("log.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if(fd < 0)
    {
        perror("open");
        return 1;
    }
    printf("fd: %d\n", fd);
    fprintf(stdout, "%s\n", "this is fprintf");
    
    const char* msg = "This is write\n";
    int n = write(fd, msg, strlen(msg));
    if(n < 0)
    {
        perror("write");
        return 2;
    }
    
    close(fd); // 不同的在这里，我最后把fd关了
    return 0;
}

最后结果运行结果也是没有在显示屏上显示，原理我们知道了发生了重定向。但我们发现log.txt文件里的数据缺少了很多，没有printf / fprintf打印的数据，只有系统调用的数据，这是什么原因呢？

但如果我们在close(1)之前刷新缓冲区，就可以显示数据了：
使用函数fflush(stdout);

至此我们就发现数据写入进去了。

缓冲区的理解：

从上述的理解来看，缓冲区分为 “用户级缓冲区” “内核级缓冲区”。
存在两大好处：
1、解耦（不必在意底层是如何实现）
2、提高效率

为什么提高效率？

——目的是提高用户的效率。
首先调用系统调用，是有成本的！
少调用成本就低了，效率就高了。
先暂存，最后再刷新（提高了刷新IO效率）
是什么？

是一段内存空间（每个文件都有自己的缓冲区）
为什么？

给上层提供高效的IO体验，间接提高整体效率。
怎么办？
1. 立即刷新（无缓存，直接释放缓冲区） —> fflush(stdout)， int sync(int fd);
2. 行刷新 —> 显示器（照顾用户的习惯）
3. 全缓冲 —> 缓冲区写满，才刷新（针对的是普通文件）
4. 进程退出，系统会自动刷新，强制刷新

来看看代码：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int main()
{
    printf("hello printf\n");
    fprintf(stdout, "hello fprintf\n");
    
    const char* msg = "hello write\n";
    write(1, msg, strlen(msg));

    fork();

    return 0;
}

代码这一看是没啥问题的，因为加了\n就会自动刷新，从语言层缓冲区刷新至内核级缓冲区此时我们使用重定向—— ./myfile > log.txt

这时我们就会发现一个神奇的现象，关于语言层的输出最终会被打印两次。造成这种现象的就是fork( )导致的。

当运行程序不发生重定向时，stdout是面向的显示器的，也就是1号文件描述符对应的是显示器，针对于显示器，语言级的缓冲区是按照 “行刷新” 的方式进行刷新缓冲区，所以我们就会看到程序通过’\n’一行一行的输出至屏幕上。但是由于执行了 ./myfile > log.txt，导致stdout面向的是普通文件，针对于普通文件，语言级的缓冲区是按照 “全缓冲” 的方式进行刷新的，只有当语言级缓冲区满了或者进程退出时，才会刷新缓冲区。

问题就出现在进程上，当面向文件时执行的全刷新，当fork( )创建子进程后，会导致在同一个代码结束时，造成两次刷新语言级缓冲区，从而导致打印了两次。
下面我们通过进程等待，来看看缓冲区被释放的过程：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>

int main()
{
    printf("hello printf/ ");
    fprintf(stdout, "hello fprintf/ ");
    
    const char* msg = "hello write/ ";
    write(1, msg, strlen(msg));
    

    pid_t id = fork();
    if(id == 0)
    {
        // child
        sleep(3);
        printf("<- child process quit ->/ ");
        return 0;
    }
 
    // father
    int status = 0;
    pid_t rid = waitpid(id, &status, 0);
    
    sleep(3);
    printf("<- father process quit ->/ ");
    return 0;
}