单线程内存拷贝速度居然比硬盘写入还慢?

硬件环境

CPU: Intel® Xeon® Gold 6346 CPU @ 3.10GHz

MEM: 256G

OS: ubuntu 22.04

单线程顺序拷贝

10G Src Buffer —> 10G Dst buffer

分别按照 4K, 4M,64MB, 1GB的块大小进行拷贝。

使用的是 2MB Pagesize的大页内存,确保内存的连续性。

单线程顺序拷贝代码

int test_block2block(unsigned long block_size)
{
    unsigned long i , cnt = 0;
    unsigned long block_num = MEM_SIZE / block_size;
    struct timeval start_time, end_time;
    double elapsed_time;
    gettimeofday(&start_time, NULL);
    
    for (i = 0; i < block_num; ++i) {
        void* src = buf + block_size * i;
        void* dest = dest_buf + block_size * i;
        memcpy(dest, src, block_size);
        //memset(dest, cnt, BUF_SIZE);
        cnt++;
    }

    //memcpy(dest_buf, buf, BUF_SIZE*MAX_NUM);
    gettimeofday(&end_time, NULL);
    elapsed_time = (end_time.tv_sec - start_time.tv_sec) +
                (end_time.tv_usec - start_time.tv_usec) / 1000000.0;
    unsigned long long pps = block_num / elapsed_time;
    unsigned long long Bps = MEM_SIZE / elapsed_time;
    char formatbuf[32] = {0};
    char Bytesbuf[32] = {0};
    format_Bps(Bps, formatbuf);
    format_Bytes(block_size, Bytesbuf);
    printf("Blocksize: %s Num: %lu, Time cost:  %.3lf s  Rate: %llu Blocks/s,  %s\n", Bytesbuf, cnt, elapsed_time, pps, formatbuf);
}

测试结果

大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost:  3.418 s  Rate: 767049 Blocks/s,  2.93 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.602 s  Rate: 1636428 Blocks/s,  6.24 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.598 s  Rate: 1640112 Blocks/s,  6.26 GB/s
4 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.628 s  Rate: 1610288 Blocks/s,  6.14 GB/s
-----------------
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost:  3.522 s  Rate: 726 Blocks/s,  2.84 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost:  0.844 s  Rate: 3033 Blocks/s,  11.85 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost:  0.830 s  Rate: 3084 Blocks/s,  12.05 GB/s
4 Blocksize: 4.00 MB Num: 2560, Time cost:  0.828 s  Rate: 3090 Blocks/s,  12.07 GB/s
-----------------
大块内存复制到大块内存 BlockSize 64.00 MB
1 Blocksize: 64.00 MB Num: 160, Time cost:  3.544 s  Rate: 45 Blocks/s,  2.82 GB/s
2 Blocksize: 64.00 MB Num: 160, Time cost:  0.792 s  Rate: 202 Blocks/s,  12.63 GB/s
3 Blocksize: 64.00 MB Num: 160, Time cost:  0.789 s  Rate: 202 Blocks/s,  12.68 GB/s
4 Blocksize: 64.00 MB Num: 160, Time cost:  0.797 s  Rate: 200 Blocks/s,  12.54 GB/s
-----------------
大块内存复制到大块内存 BlockSize 1.00 GB
1 Blocksize: 1.00 GB Num: 10, Time cost:  3.626 s  Rate: 2 Blocks/s,  2.76 GB/s
2 Blocksize: 1.00 GB Num: 10, Time cost:  0.873 s  Rate: 11 Blocks/s,  11.45 GB/s
3 Blocksize: 1.00 GB Num: 10, Time cost:  0.871 s  Rate: 11 Blocks/s,  11.48 GB/s
4 Blocksize: 1.00 GB Num: 10, Time cost:  0.873 s  Rate: 11 Blocks/s,  11.45 GB/s

首次拷贝都是 2.9GB/s,比很多SSD的顺序写入都慢 ! 想不到吧。

第2次以后,速度提升,BlockSize > 4MB时稳定11GB/s以上。

为什么会这样呢?

首次拷贝刚申请的内存还很冷,身体还没活动开,就很慢,等第一次拷贝结束之后,身体都热身好了,速度就快了。(狗头)

内存热身拷贝

那就在拷贝之前,给这一整个buffer赋值,完成热身,再拷贝。

预热内存代码

void heat_mem_read(unsigned char* buf, unsigned long size)
{
    printf("预热读取内存\n");
    unsigned long num = size / sizeof(unsigned long);
    unsigned long i = 0, tmp = 0;
    unsigned long* a = (unsigned long*)buf;
    for (i = 0; i < num; i++) {
        a[i] = tmp;
    }
}
void heat_mem_write(unsigned char* buf, unsigned long size)
{
    printf("预热写入内存\n");
    unsigned long num = size / sizeof(unsigned long);
    unsigned long i = 0, tmp = 1;
    unsigned long* a = (unsigned long*)buf;
    for (i = 0; i < num; i++) {
        a[i] = tmp;
    }
}

测试结果

大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost:  3.279 s  Rate: 799555 Blocks/s,  3.05 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.554 s  Rate: 1686673 Blocks/s,  6.43 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.559 s  Rate: 1681604 Blocks/s,  6.41 GB/s
-----------------
预热读取内存
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost:  2.300 s  Rate: 1139620 Blocks/s,  4.35 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.567 s  Rate: 1673100 Blocks/s,  6.38 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.564 s  Rate: 1676087 Blocks/s,  6.39 GB/s
-----------------
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost:  2.861 s  Rate: 916351 Blocks/s,  3.50 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.947 s  Rate: 1346201 Blocks/s,  5.14 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.937 s  Rate: 1353486 Blocks/s,  5.16 GB/s
-----------------
预热读取内存
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.596 s  Rate: 1642803 Blocks/s,  6.27 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.601 s  Rate: 1637784 Blocks/s,  6.25 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost:  1.598 s  Rate: 1640718 Blocks/s,  6.26 GB/s
-----------------
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost:  3.516 s  Rate: 728 Blocks/s,  2.84 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost:  0.857 s  Rate: 2987 Blocks/s,  11.67 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost:  0.845 s  Rate: 3029 Blocks/s,  11.83 GB/s
-----------------
预热读取内存
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost:  2.278 s  Rate: 1123 Blocks/s,  4.39 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost:  1.225 s  Rate: 2090 Blocks/s,  8.16 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost:  1.091 s  Rate: 2346 Blocks/s,  9.16 GB/s
-----------------
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost:  1.812 s  Rate: 1412 Blocks/s,  5.52 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost:  0.843 s  Rate: 3036 Blocks/s,  11.86 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost:  0.844 s  Rate: 3033 Blocks/s,  11.85 GB/s
-----------------
预热读取内存
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost:  0.814 s  Rate: 3145 Blocks/s,  12.29 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost:  0.813 s  Rate: 3149 Blocks/s,  12.30 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost:  0.815 s  Rate: 3139 Blocks/s,  12.27 GB/s

热身过的内存果然厉害,速度稳定。

原因分析

虚拟内存和物理内存之间的映射并不是申请时发生的,而是在访问到某一页时,才进行当前页的虚拟地址和物理地址的映射。(Page Fault)

所谓冷内存,即尚未映射物理地址的虚拟内存

所谓热内存,已经映射物理地址的虚拟内存

第一次拷贝时,一边拷贝一边建立虚拟地址和物理地址的映射关系,速度就慢了!

MMU地址转换流程如下: (参考:https://www.cnblogs.com/cdaniu/p/15614214.html)
在这里插入图片描述

  • 3
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值