stream 内存带宽测试Docker性能损失

最新推荐文章于 2024-07-05 15:24:08 发布

hkNaruto

最新推荐文章于 2024-07-05 15:24:08 发布

阅读量1.4k

点赞数 3

分类专栏： benchmark docker 文章标签： docker benchmark

本文链接：https://blog.csdn.net/hknaruto/article/details/109291259

版权

docker 同时被 2 个专栏收录

31 篇文章 1 订阅

订阅专栏

benchmark

5 篇文章 0 订阅

订阅专栏

1。宿主机测试

下载源码

[yeqiang@harbor STREAM]$ git clone https://github.com/jeffhammond/STREAM

单线程测试

[yeqiang@harbor STREAM]$ gcc -O  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[yeqiang@harbor STREAM]$ ./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 566061 microseconds.
   (= 566061 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           20774.8     0.774718     0.770165     0.783492
Scale:          20689.9     0.778666     0.773325     0.790584
Add:            23589.8     1.023929     1.017391     1.042266
Triad:          23338.4     1.032000     1.028349     1.039814
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays

多线程测试

[yeqiang@harbor STREAM]$ gcc -O -fopenmp  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[yeqiang@harbor STREAM]$ ./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 6
Number of Threads counted = 6
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 498106 microseconds.
   (= 498106 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           21930.7     0.748195     0.729571     0.782713
Scale:          22101.5     0.738321     0.723932     0.757996
Add:            24832.2     0.978239     0.966488     0.999065
Triad:          24901.5     0.975178     0.963797     0.987780
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

2. Docker内STREAM测试

单线程

[root@c6f0d2296598 STREAM]# gcc -O  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[root@c6f0d2296598 STREAM]# ./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 555674 microseconds.
   (= 555674 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           21290.9     0.756061     0.751496     0.763196
Scale:          20938.2     0.772817     0.764154     0.785505
Add:            24033.4     1.004074     0.998611     1.014896
Triad:          23819.8     1.013162     1.007565     1.021040
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

多线程

[root@c6f0d2296598 STREAM]# gcc -O -fopenmp  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[root@c6f0d2296598 STREAM]#./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 6
Number of Threads counted = 6
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 494348 microseconds.
   (= 494348 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           22041.1     0.728465     0.725915     0.733348
Scale:          22164.6     0.727715     0.721871     0.744801
Add:            25010.1     0.967849     0.959611     0.979415
Triad:          25066.3     0.964457     0.957460     0.972716
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------