硬件环境
CPU: Intel® Xeon® Gold 6346 CPU @ 3.10GHz
MEM: 256G
OS: ubuntu 22.04
单线程顺序拷贝
10G Src Buffer —> 10G Dst buffer
分别按照 4K, 4M,64MB, 1GB的块大小进行拷贝。
使用的是 2MB Pagesize的大页内存,确保内存的连续性。
单线程顺序拷贝代码
int test_block2block(unsigned long block_size)
{
unsigned long i , cnt = 0;
unsigned long block_num = MEM_SIZE / block_size;
struct timeval start_time, end_time;
double elapsed_time;
gettimeofday(&start_time, NULL);
for (i = 0; i < block_num; ++i) {
void* src = buf + block_size * i;
void* dest = dest_buf + block_size * i;
memcpy(dest, src, block_size);
//memset(dest, cnt, BUF_SIZE);
cnt++;
}
//memcpy(dest_buf, buf, BUF_SIZE*MAX_NUM);
gettimeofday(&end_time, NULL);
elapsed_time = (end_time.tv_sec - start_time.tv_sec) +
(end_time.tv_usec - start_time.tv_usec) / 1000000.0;
unsigned long long pps = block_num / elapsed_time;
unsigned long long Bps = MEM_SIZE / elapsed_time;
char formatbuf[32] = {0};
char Bytesbuf[32] = {0};
format_Bps(Bps, formatbuf);
format_Bytes(block_size, Bytesbuf);
printf("Blocksize: %s Num: %lu, Time cost: %.3lf s Rate: %llu Blocks/s, %s\n", Bytesbuf, cnt, elapsed_time, pps, formatbuf);
}
测试结果
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost: 3.418 s Rate: 767049 Blocks/s, 2.93 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.602 s Rate: 1636428 Blocks/s, 6.24 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.598 s Rate: 1640112 Blocks/s, 6.26 GB/s
4 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.628 s Rate: 1610288 Blocks/s, 6.14 GB/s
-----------------
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost: 3.522 s Rate: 726 Blocks/s, 2.84 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost: 0.844 s Rate: 3033 Blocks/s, 11.85 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost: 0.830 s Rate: 3084 Blocks/s, 12.05 GB/s
4 Blocksize: 4.00 MB Num: 2560, Time cost: 0.828 s Rate: 3090 Blocks/s, 12.07 GB/s
-----------------
大块内存复制到大块内存 BlockSize 64.00 MB
1 Blocksize: 64.00 MB Num: 160, Time cost: 3.544 s Rate: 45 Blocks/s, 2.82 GB/s
2 Blocksize: 64.00 MB Num: 160, Time cost: 0.792 s Rate: 202 Blocks/s, 12.63 GB/s
3 Blocksize: 64.00 MB Num: 160, Time cost: 0.789 s Rate: 202 Blocks/s, 12.68 GB/s
4 Blocksize: 64.00 MB Num: 160, Time cost: 0.797 s Rate: 200 Blocks/s, 12.54 GB/s
-----------------
大块内存复制到大块内存 BlockSize 1.00 GB
1 Blocksize: 1.00 GB Num: 10, Time cost: 3.626 s Rate: 2 Blocks/s, 2.76 GB/s
2 Blocksize: 1.00 GB Num: 10, Time cost: 0.873 s Rate: 11 Blocks/s, 11.45 GB/s
3 Blocksize: 1.00 GB Num: 10, Time cost: 0.871 s Rate: 11 Blocks/s, 11.48 GB/s
4 Blocksize: 1.00 GB Num: 10, Time cost: 0.873 s Rate: 11 Blocks/s, 11.45 GB/s
首次拷贝都是 2.9GB/s,比很多SSD的顺序写入都慢 ! 想不到吧。
第2次以后,速度提升,BlockSize > 4MB时稳定11GB/s以上。
为什么会这样呢?
首次拷贝刚申请的内存还很冷,身体还没活动开,就很慢,等第一次拷贝结束之后,身体都热身好了,速度就快了。(狗头)
内存热身拷贝
那就在拷贝之前,给这一整个buffer赋值,完成热身,再拷贝。
预热内存代码
void heat_mem_read(unsigned char* buf, unsigned long size)
{
printf("预热读取内存\n");
unsigned long num = size / sizeof(unsigned long);
unsigned long i = 0, tmp = 0;
unsigned long* a = (unsigned long*)buf;
for (i = 0; i < num; i++) {
a[i] = tmp;
}
}
void heat_mem_write(unsigned char* buf, unsigned long size)
{
printf("预热写入内存\n");
unsigned long num = size / sizeof(unsigned long);
unsigned long i = 0, tmp = 1;
unsigned long* a = (unsigned long*)buf;
for (i = 0; i < num; i++) {
a[i] = tmp;
}
}
测试结果
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost: 3.279 s Rate: 799555 Blocks/s, 3.05 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.554 s Rate: 1686673 Blocks/s, 6.43 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.559 s Rate: 1681604 Blocks/s, 6.41 GB/s
-----------------
预热读取内存
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost: 2.300 s Rate: 1139620 Blocks/s, 4.35 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.567 s Rate: 1673100 Blocks/s, 6.38 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.564 s Rate: 1676087 Blocks/s, 6.39 GB/s
-----------------
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost: 2.861 s Rate: 916351 Blocks/s, 3.50 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.947 s Rate: 1346201 Blocks/s, 5.14 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.937 s Rate: 1353486 Blocks/s, 5.16 GB/s
-----------------
预热读取内存
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 KB
1 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.596 s Rate: 1642803 Blocks/s, 6.27 GB/s
2 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.601 s Rate: 1637784 Blocks/s, 6.25 GB/s
3 Blocksize: 4.00 KB Num: 2621440, Time cost: 1.598 s Rate: 1640718 Blocks/s, 6.26 GB/s
-----------------
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost: 3.516 s Rate: 728 Blocks/s, 2.84 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost: 0.857 s Rate: 2987 Blocks/s, 11.67 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost: 0.845 s Rate: 3029 Blocks/s, 11.83 GB/s
-----------------
预热读取内存
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost: 2.278 s Rate: 1123 Blocks/s, 4.39 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost: 1.225 s Rate: 2090 Blocks/s, 8.16 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost: 1.091 s Rate: 2346 Blocks/s, 9.16 GB/s
-----------------
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost: 1.812 s Rate: 1412 Blocks/s, 5.52 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost: 0.843 s Rate: 3036 Blocks/s, 11.86 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost: 0.844 s Rate: 3033 Blocks/s, 11.85 GB/s
-----------------
预热读取内存
预热写入内存
大块内存复制到大块内存 BlockSize 4.00 MB
1 Blocksize: 4.00 MB Num: 2560, Time cost: 0.814 s Rate: 3145 Blocks/s, 12.29 GB/s
2 Blocksize: 4.00 MB Num: 2560, Time cost: 0.813 s Rate: 3149 Blocks/s, 12.30 GB/s
3 Blocksize: 4.00 MB Num: 2560, Time cost: 0.815 s Rate: 3139 Blocks/s, 12.27 GB/s
热身过的内存果然厉害,速度稳定。
原因分析
虚拟内存和物理内存之间的映射并不是申请时发生的,而是在访问到某一页时,才进行当前页的虚拟地址和物理地址的映射。(Page Fault)
所谓冷内存,即尚未映射物理地址的虚拟内存
所谓热内存,已经映射物理地址的虚拟内存
第一次拷贝时,一边拷贝一边建立虚拟地址和物理地址的映射关系,速度就慢了!
MMU地址转换流程如下: (参考:https://www.cnblogs.com/cdaniu/p/15614214.html)