1。 宿主机测试
下载源码
[yeqiang@harbor STREAM]$ git clone https://github.com/jeffhammond/STREAM
单线程测试
[yeqiang@harbor STREAM]$ gcc -O -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[yeqiang@harbor STREAM]$ ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 566061 microseconds.
(= 566061 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 20774.8 0.774718 0.770165 0.783492
Scale: 20689.9 0.778666 0.773325 0.790584
Add: 23589.8 1.023929 1.017391 1.042266
Triad: 23338.4 1.032000 1.028349 1.039814
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
多线程测试
[yeqiang@harbor STREAM]$ gcc -O -fopenmp -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[yeqiang@harbor STREAM]$ ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 6
Number of Threads counted = 6
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 498106 microseconds.
(= 498106 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 21930.7 0.748195 0.729571 0.782713
Scale: 22101.5 0.738321 0.723932 0.757996
Add: 24832.2 0.978239 0.966488 0.999065
Triad: 24901.5 0.975178 0.963797 0.987780
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
2. Docker内STREAM测试
单线程
[root@c6f0d2296598 STREAM]# gcc -O -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[root@c6f0d2296598 STREAM]# ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 555674 microseconds.
(= 555674 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 21290.9 0.756061 0.751496 0.763196
Scale: 20938.2 0.772817 0.764154 0.785505
Add: 24033.4 1.004074 0.998611 1.014896
Triad: 23819.8 1.013162 1.007565 1.021040
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
多线程
[root@c6f0d2296598 STREAM]# gcc -O -fopenmp -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[root@c6f0d2296598 STREAM]#./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 6
Number of Threads counted = 6
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 494348 microseconds.
(= 494348 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 22041.1 0.728465 0.725915 0.733348
Scale: 22164.6 0.727715 0.721871 0.744801
Add: 25010.1 0.967849 0.959611 0.979415
Triad: 25066.3 0.964457 0.957460 0.972716
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
总结
Docker对内存性能无明星影响