使用mmap()和使用 read()write()实现文件拷贝的对比

最新推荐文章于 2025-10-06 23:17:23 发布

转载最新推荐文章于 2025-10-06 23:17:23 发布 · 3.2k 阅读

文章标签：

#buffer #timezone #struct #磁盘 #null

C/C++ 专栏收录该内容

149 篇文章

订阅专栏

本文通过实现mmap内存映射和read/write两种文件拷贝方式，比较了它们在不同BUFFER_SIZE设置下的执行效率。结果显示，在BUFFER_SIZE较小的情况下，mmap方法显著优于read/write方法。

最近上课老师说了这样一句话:mmap()内存映射可以实现文件的拷贝，并且速度明显快于一般的文件拷贝，于是
我试着实现了两种文件拷贝所花费时间的比较，首先看代码：

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/time.h>
#include <string.h>

#define BUFFER_SIZE 1

void my_copy1()
{
    int fin,fout;
    void *start;
    void *end;
    struct stat sb;
    if((fin = open("file.in",O_RDONLY)) < 0){
        perror("open error");
        exit(EXIT_FAILURE);
    }
    if((fout = open( "file.out",O_RDWR | O_CREAT | O_TRUNC,00600)) < 0 ){
        perror( "write error" );
        exit( EXIT_FAILURE );
    }

    fstat(fin,&sb);

    if(lseek(fout,sb.st_size-1,SEEK_SET) < 0 ){
        exit(EXIT_FAILURE);
    }
    if(write(fout, &sb,1) != 1 ){
        exit(EXIT_FAILURE);
    }

    start = mmap(NULL,sb.st_size,PROT_READ,MAP_PRIVATE,fin,0);
    if(start == MAP_FAILED)
        return;

    end = mmap(0,(size_t)sb.st_size,PROT_WRITE,MAP_SHARED,fout,0);
    if(end == MAP_FAILED){
        perror( "mmap target" );
        exit( EXIT_FAILURE );
    }

    memcpy(end,start,(size_t)sb.st_size);
    munmap(start,sb.st_size); //关闭映射
    close(fin);
    close(fout);
    return;
}

void my_copy2()
{
    int fin,fout;
    int bytes_read,bytes_write;
    char buffer[BUFFER_SIZE];
    char *ptr;
    if((fin = open("file.in",O_RDONLY)) < 0){
        perror("open error");
        exit(EXIT_FAILURE);
    }
    if((fout = open( "file.out",O_RDWR | O_CREAT | O_TRUNC,00700)) < 0 ){
        perror( "write error" );
        exit( EXIT_FAILURE );
    }

    while(bytes_read=read(fin,buffer,BUFFER_SIZE)){
        if((bytes_read==-1)&&(errno!=EINTR))
            break;
        else if(bytes_read>0){
            ptr=buffer;
            while(bytes_write=write(fout,ptr,bytes_read)){
                if((bytes_write==-1)&&(errno!=EINTR))
                    break;
                else if(bytes_write==bytes_read)
                    break;
                else if(bytes_write>0){
                    ptr+=bytes_write;
                    bytes_read-=bytes_write;
                }
            }
            if(bytes_write==-1)
               break;
        }
    }

    close(fin);
    close(fout);
    return;
}

main()
{
    struct timeval tv;
    struct timezone tz;
    int time_start,time_end;
    gettimeofday(&tv,&tz);
    time_start = (int)tv.tv_usec;
    my_copy1();
    printf("/ndone./n/n");
    gettimeofday(&tv,&tz);
    time_end = (int)tv.tv_usec;
    printf("using /"mmap()/" to copy costs %d microseconds /n",time_end-time_start);

    gettimeofday(&tv,&tz);
    time_start = (int)tv.tv_usec;
    my_copy2();
    gettimeofday(&tv,&tz);
    time_end = (int)tv.tv_usec;
    printf("using /"read() and write()/" to copy costs %d microseconds /n",time_end-time_start);
}

代码不是很难，中间使用了一些LinuxC的一些函数，不懂的可以自己查阅相关资料。我现在主要使想就两种
不同的拷贝的实现在所花费的时间上的一些比较以及的出我自己的一些观点，调试程序时可以将BUFFER_SIZE
随意更改一个数字，表示的是使用read函数从文件中一次读取的字符个数。当然，强调了这个必然有原因。
如果BUFFER_SIZE很小的话，最终的结果差别很大。比如我的
BUFFER_SIZE=1时我的运行结果如下：
zhou@zhou:~/LinuxC/file/mmcopy$ ./mmap

done.

using "mmap()" to copy costs 591 microseconds
using "read() and write()" to copy costs 505337 microseconds
zhou@zhou:~/LinuxC/file/mmcopy$
两个完全不是一个数量级的。下面换个数字
BUFFER_SIZE=10000 时我的运行情况如下：
zhou@zhou:~/LinuxC/file/mmcopy$ ./mmap

done.

using "mmap()" to copy costs 594 microseconds
using "read() and write()" to copy costs 585 microseconds
zhou@zhou:~/LinuxC/file/mmcopy$
这时两个的消耗时间很接近，可以想象。如果BUFFER_SIZE定义的很大的话，那么read()write()方法将会
非常快，但是。如果你要拷贝的文件很小呢，加入只有100字节，但是你却每次申请10000个字节，这样岂不
是很浪费内存。这也就是mmap()的优势，不仅没有浪费内存，而且速度相当的快。
话题一转，这是为什么呢，我的理解使这样的：mmap首先将要拷贝的文件的内容全部映射到内存，然后写到目
的文件，总共的磁盘操作就两次，而read()write()不同，会根据你的BUFFER_SIZE定义的，然后会执行
(文件内容的总的字节数 /BUFFER_SIZE)*2 次的磁盘操作，因此在这上面浪费了大量的时间。所以了

好的，就这么多了，如果有什么问题可以直接留言，互相讨论，谢谢