stream 内存带宽测试Docker性能损失

1。 宿主机测试

下载源码

[yeqiang@harbor STREAM]$ git clone https://github.com/jeffhammond/STREAM

单线程测试

[yeqiang@harbor STREAM]$ gcc -O  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[yeqiang@harbor STREAM]$ ./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 566061 microseconds.
   (= 566061 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           20774.8     0.774718     0.770165     0.783492
Scale:          20689.9     0.778666     0.773325     0.790584
Add:            23589.8     1.023929     1.017391     1.042266
Triad:          23338.4     1.032000     1.028349     1.039814
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays

多线程测试

[yeqiang@harbor STREAM]$ gcc -O -fopenmp  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[yeqiang@harbor STREAM]$ ./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 6
Number of Threads counted = 6
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 498106 microseconds.
   (= 498106 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           21930.7     0.748195     0.729571     0.782713
Scale:          22101.5     0.738321     0.723932     0.757996
Add:            24832.2     0.978239     0.966488     0.999065
Triad:          24901.5     0.975178     0.963797     0.987780
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

2. Docker内STREAM测试

单线程

[root@c6f0d2296598 STREAM]# gcc -O  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[root@c6f0d2296598 STREAM]# ./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 555674 microseconds.
   (= 555674 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           21290.9     0.756061     0.751496     0.763196
Scale:          20938.2     0.772817     0.764154     0.785505
Add:            24033.4     1.004074     0.998611     1.014896
Triad:          23819.8     1.013162     1.007565     1.021040
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

多线程

[root@c6f0d2296598 STREAM]# gcc -O -fopenmp  -DSTREAM_ARRAY_SIZE=1000000000 -DNTIME=20 -mcmodel=large stream.c -o stream
[root@c6f0d2296598 STREAM]#./stream 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 6
Number of Threads counted = 6
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 494348 microseconds.
   (= 494348 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           22041.1     0.728465     0.725915     0.733348
Scale:          22164.6     0.727715     0.721871     0.744801
Add:            25010.1     0.967849     0.959611     0.979415
Triad:          25066.3     0.964457     0.957460     0.972716
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

总结

Docker对内存性能无明星影响

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值