以合理,安全和有效的方式复制文件

本文翻译自:Copy a file in a sane, safe and efficient way

I search for a good way to copy a file (binary or text). 我搜索一个复制文件(二进制或文本)的好方法。 I've written several samples, everyone works. 我写过几个样本,每个人都在工作。 But I want hear the opinion of seasoned programmers. 但我想听听经验丰富的程序员的意见。

I missing good examples and search a way which works with C++. 我错过了很好的例子并搜索了一种适用于C ++的方法。

ANSI-C-WAY ANSI-C-WAY

#include <iostream>
#include <cstdio>    // fopen, fclose, fread, fwrite, BUFSIZ
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    // BUFSIZE default is 8192 bytes
    // BUFSIZE of 1 means one chareter at time
    // good values should fit to blocksize, like 1024 or 4096
    // higher values reduce number of system calls
    // size_t BUFFER_SIZE = 4096;

    char buf[BUFSIZ];
    size_t size;

    FILE* source = fopen("from.ogv", "rb");
    FILE* dest = fopen("to.ogv", "wb");

    // clean and more secure
    // feof(FILE* stream) returns non-zero if the end of file indicator for stream is set

    while (size = fread(buf, 1, BUFSIZ, source)) {
        fwrite(buf, 1, size, dest);
    }

    fclose(source);
    fclose(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " << end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

POSIX-WAY (K&R use this in "The C programming language", more low-level) POSIX-WAY (K&R在“C编程语言”中使用它,更低级别)

#include <iostream>
#include <fcntl.h>   // open
#include <unistd.h>  // read, write, close
#include <cstdio>    // BUFSIZ
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    // BUFSIZE defaults to 8192
    // BUFSIZE of 1 means one chareter at time
    // good values should fit to blocksize, like 1024 or 4096
    // higher values reduce number of system calls
    // size_t BUFFER_SIZE = 4096;

    char buf[BUFSIZ];
    size_t size;

    int source = open("from.ogv", O_RDONLY, 0);
    int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);

    while ((size = read(source, buf, BUFSIZ)) > 0) {
        write(dest, buf, size);
    }

    close(source);
    close(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " << end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

KISS-C++-Streambuffer-WAY KISS-C ++ - Streambuffer-WAY

#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    dest << source.rdbuf();

    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

COPY-ALGORITHM-C++-WAY COPY-算法-C ++ - WAY

#include <iostream>
#include <fstream>
#include <ctime>
#include <algorithm>
#include <iterator>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    istreambuf_iterator<char> begin_source(source);
    istreambuf_iterator<char> end_source;
    ostreambuf_iterator<char> begin_dest(dest); 
    copy(begin_source, end_source, begin_dest);

    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

OWN-BUFFER-C++-WAY OWN-BUFFER-C ++ - WAY

#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    // file size
    source.seekg(0, ios::end);
    ifstream::pos_type size = source.tellg();
    source.seekg(0);
    // allocate memory for buffer
    char* buffer = new char[size];

    // copy file    
    source.read(buffer, size);
    dest.write(buffer, size);

    // clean up
    delete[] buffer;
    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

LINUX-WAY // requires kernel >= 2.6.33 LINUX-WAY //需要内核> = 2.6.33

#include <iostream>
#include <sys/sendfile.h>  // sendfile
#include <fcntl.h>         // open
#include <unistd.h>        // close
#include <sys/stat.h>      // fstat
#include <sys/types.h>     // fstat
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    int source = open("from.ogv", O_RDONLY, 0);
    int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);

    // struct required, rationale: function stat() exists also
    struct stat stat_source;
    fstat(source, &stat_source);

    sendfile(dest, source, 0, stat_source.st_size);

    close(source);
    close(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

Environment 环境

  • GNU/LINUX (Archlinux) GNU / LINUX(Archlinux)
  • Kernel 3.3 内核3.3
  • GLIBC-2.15, LIBSTDC++ 4.7 (GCC-LIBS), GCC 4.7, Coreutils 8.16 GLIBC-2.15,LIBSTDC ++ 4.7(GCC-LIBS),GCC 4.7,Coreutils 8.16
  • Using RUNLEVEL 3 (Multiuser, Network, Terminal, no GUI) 使用RUNLEVEL 3(多用户,网络,终端,无GUI)
  • INTEL SSD-Postville 80 GB, filled up to 50% INTEL SSD-Postville 80 GB,最高可达50%
  • Copy a 270 MB OGG-VIDEO-FILE 复制270 MB OGG-VIDEO-FILE

Steps to reproduce 重现步骤

 1. $ rm from.ogg
 2. $ reboot                           # kernel and filesystem buffers are in regular
 3. $ (time ./program) &>> report.txt  # executes program, redirects output of program and append to file
 4. $ sha256sum *.ogv                  # checksum
 5. $ rm to.ogg                        # remove copy, but no sync, kernel and fileystem buffers are used
 6. $ (time ./program) &>> report.txt  # executes program, redirects output of program and append to file

Results (CPU TIME used) 结果(使用的CPU时间)

Program  Description                 UNBUFFERED|BUFFERED
ANSI C   (fread/frwite)                 490,000|260,000  
POSIX    (K&R, read/write)              450,000|230,000  
FSTREAM  (KISS, Streambuffer)           500,000|270,000 
FSTREAM  (Algorithm, copy)              500,000|270,000
FSTREAM  (OWN-BUFFER)                   500,000|340,000  
SENDFILE (native LINUX, sendfile)       410,000|200,000  

Filesize doesn't change. Filesize不会改变。
sha256sum print the same results. sha256sum打印相同的结果。
The video file is still playable. 视频文件仍然可以播放。

Questions 问题

  • What method would you prefer? 你更喜欢什么方法?
  • Do you know better solutions? 你知道更好的解决方案吗?
  • Do you see any mistakes in my code? 你看到我的代码中有任何错误吗?
  • Do you know a reason to avoid a solution? 你知道避免解决方案的理由吗?

  • FSTREAM (KISS, Streambuffer) FSTREAM(KISS,Streambuffer)
    I really like this one, because it is really short and simple. 我真的很喜欢这个,因为它非常简短。 As far is I know the operator << is overloaded for rdbuf() and doesn't convert anything. 到目前为止我知道运算符<<为rdbuf()重载并且没有转换任何东西。 Correct? 正确?

Thanks 谢谢

Update 1 更新1
I changed the source in all samples in that way, that the open and close of the file descriptors is include in the measurement of clock() . 我以这种方式改变了所有样本中的源,即文件描述符的打开和关闭包括在clock()的测量中。 Their are no other significant changes in the source code. 它们在源代码中没有其他重大变化。 The results doesn't changed! 结果没有改变! I also used time to double-check my results. 我也花时间仔细检查我的结果。

Update 2 更新2
ANSI C sample changed: The condition of the while-loop doesn't call any longer feof() instead I moved fread() into the condition. ANSI C示例已更改: while循环的条件不再调用feof()而是将fread()移动到条件中。 It looks like, the code runs now 10,000 clocks faster. 看起来,代码现在运行速度提高了10,000个时钟。

Measurement changed: The former results were always buffered, because I repeated the old command line rm to.ogv && sync && time ./program for each program a few times. 测量改变了:前面的结果总是被缓冲,因为我重复了几次每个程序的旧命令行rm to.ogv && sync && time ./program Now I reboot the system for every program. 现在我为每个程序重启系统。 The unbuffered results are new and show no surprise. 无缓冲的结果是新的,毫不奇怪。 The unbuffered results didn't changed really. 无缓冲的结果并没有真正改变。

If i don't delete the old copy, the programs react different. 如果我不删除旧副本,程序会有不同的反应。 Overwriting a existing file buffered is faster with POSIX and SENDFILE, all other programs are slower. 使用POSIX和SENDFILE覆盖现有缓冲的文件更快,所有其他程序都更慢。 Maybe the options truncate or create have a impact on this behaviour. 也许截断创建选项会对此行为产生影响。 But overwriting existing files with the same copy is not a real world use-case. 但是用相同的副本覆盖现有文件并不是真实的用例。

Performing the copy with cp takes 0.44 seconds unbuffered und 0.30 seconds buffered. 使用cp执行复制需要0.44秒无缓冲和0.30秒缓冲。 So cp is a little bit slower than the POSIX sample. 所以cp比POSIX样本慢一点。 Looks fine for me. 对我来说很好看。

Maybe I add also samples and results of mmap() and copy_file() from boost::filesystem. 也许我还从boost :: filesystem中添加了mmap()copy_file()示例和结果。

Update 3 更新3
I've put this also on a blog page and extended it a little bit. 我也将它放在一个博客页面上,并将其扩展了一点。 Including splice() , which is a low-level function from the Linux kernel. 包括splice() ,它是Linux内核的低级函数。 Maybe more samples with Java will follow. 也许会有更多带有Java的样本。 http://www.ttyhoney.com/blog/?page_id=69 http://www.ttyhoney.com/blog/?page_id=69


#1楼

参考:https://stackoom.com/question/GMh1/以合理-安全和有效的方式复制文件


#2楼

Copy a file in a sane way: 以理智的方式复制文件:

#include <fstream>

int main()
{
    std::ifstream  src("from.ogv", std::ios::binary);
    std::ofstream  dst("to.ogv",   std::ios::binary);

    dst << src.rdbuf();
}

This is so simple and intuitive to read it is worth the extra cost. 这是如此简单和直观,阅读它是值得的额外费用。 If we were doing it a lot, better to fall back on OS calls to the file system. 如果我们做了很多,最好还是回到对文件系统的OS调用。 I am sure boost has a copy file method in its filesystem class. 我相信boost在其文件系统类中有一个复制文件方法。

There is a C method for interacting with the file system: 有一种与文件系统交互的C方法:

#include <copyfile.h>

int
copyfile(const char *from, const char *to, copyfile_state_t state, copyfile_flags_t flags);

#3楼

Too many! 太多!

The "ANSI C" way buffer is redundant, since a FILE is already buffered. “ANSI C”方式缓冲区是冗余的,因为FILE已经被缓冲。 (The size of this internal buffer is what BUFSIZ actually defines.) (此内部缓冲区的大小是BUFSIZ实际定义的大小。)

The "OWN-BUFFER-C++-WAY" will be slow as it goes through fstream , which does a lot of virtual dispatching, and again maintains internal buffers or each stream object. “OWN-BUFFER-C ++-WAY”在通过fstream会很慢,它会进行大量的虚拟调度,并再次维护内部缓冲区或每个流对象。 (The "COPY-ALGORITHM-C++-WAY" does not suffer this, as the streambuf_iterator class bypasses the stream layer.) (“COPY-ALGORITHM-C ++ - WAY”不会受此影响,因为streambuf_iterator类会绕过流层。)

I prefer the "COPY-ALGORITHM-C++-WAY", but without constructing an fstream , just create bare std::filebuf instances when no actual formatting is needed. 我更喜欢“COPY-ALGORITHM-C ++-WAY”,但是如果不构建一个fstream ,只需要在没有实际格式化时创建裸std::filebuf实例。

For raw performance, you can't beat POSIX file descriptors. 对于原始性能,您无法击败POSIX文件描述符。 It's ugly but portable and fast on any platform. 它在任何平台上都很丑陋但便携且快速。

The Linux way appears to be incredibly fast — perhaps the OS let the function return before I/O was finished? Linux的方式似乎非常快 - 也许操作系统在I / O完成之前让函数返回? In any case, that's not portable enough for many applications. 在任何情况下,这对于许多应用来说都不够便携。

EDIT : Ah, "native Linux" may be improving performance by interleaving reads and writes with asynchronous I/O. 编辑 :啊,“原生Linux”可能通过使用异步I / O交错读写来提高性能。 Letting commands pile up can help the disk driver decide when is best to seek. 让命令堆积可以帮助磁盘驱动器决定何时最好寻找。 You might try Boost Asio or pthreads for comparison. 您可以尝试使用Boost Asio或pthreads进行比较。 As for "can't beat POSIX file descriptors"… well that's true if you're doing anything with the data, not just blindly copying. 至于“无法击败POSIX文件描述符”......如果您对数据做任何事情,那就是真的,而不仅仅是盲目复制。


#4楼

I want to make the very important note that the LINUX method using sendfile() has a major problem in that it can not copy files more than 2GB in size! 我想提出一个非常重要的注意事项,即使用sendfile()的LINUX方法存在一个主要问题,即它无法复制大小超过2GB的文件! I had implemented it following this question and was hitting problems because I was using it to copy HDF5 files that were many GB in size. 我已经按照这个问题实现了它并且遇到了问题,因为我正在使用它来复制大小为GB的HDF5文件。

http://man7.org/linux/man-pages/man2/sendfile.2.html http://man7.org/linux/man-pages/man2/sendfile.2.html

sendfile() will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. sendfile()最多传输0x7ffff000(2,147,479,552)个字节,返回实际传输的字节数。 (This is true on both 32-bit and 64-bit systems.) (在32位和64位系统上都是如此。)


#5楼

With C++17 the standard way to copy a file will be including the <filesystem> header and using: 使用C ++ 17,复制文件的标准方法是包含<filesystem>头并使用:

bool copy_file( const std::filesystem::path& from,
                const std::filesystem::path& to);

bool copy_file( const std::filesystem::path& from,
                const std::filesystem::path& to,
                std::filesystem::copy_options options);

The first form is equivalent to the second one with copy_options::none used as options (see also copy_file ). 第一种形式是等同于第二与一个copy_options::none用作选项(也参见copy_file )。

The filesystem library was originally developed as boost.filesystem and finally merged to ISO C++ as of C++17. filesystem库最初是作为boost.filesystem开发的,最后从C ++ 17开始合并到ISO C ++。


#6楼

Qt has a method for copying files: Qt有一种复制文件的方法:

#include <QFile>
QFile::copy("originalFile.example","copiedFile.example");

Note that to use this you have to install Qt (instructions here ) and include it in your project (if you're using Windows and you're not an administrator, you can download Qt here instead). 请注意,要使用此功能,您必须安装Qt此处的说明)并将其包含在您的项目中(如果您使用的是Windows而您不是管理员,则可以在此处下载Qt)。 Also see this answer . 另见这个答案

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值