内存性能测试工具

内存性能测试工具

内存性能测试工具包括常用的stream(最常用),sysbench等。

1. dd简单测试内存读写速度

dd测试内存性能不常用。dd命令为linux系统自带,无需安装,可以通过如下命令简单地测试系统内存性能:

 

shell

复制代码

# 运行命令如下,从linux的zero设备作为输入,输出到null设备。 $ dd if=/dev/zero of=/dev/null bs=4096 count=1048576 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 2.69363 s, 1.6 GB/s

通过数据复制的速度,简单对比机器内存的性能。

2. stream测试内存性能

2.1 安装

 

shell

复制代码

$ mkdir stream $ cd stream/ # 国外下载站点 $ wget https://www.cs.virginia.edu/stream/FTP/Code/stream.c # 国内源下载安装 $ git clone https://gitee.com/lldhsds/stream.git $ cd stream/ # 编译安装 $ gcc stream.c -O3 -fopenmp -DSTREAM_ARRAY_SIZE=1024*1024*1024 -DNTIMES=20 -mcmodel=medium -o stream.1g.20

编译参数说明:

  • stream.c:待编译的源码文件,最新版本为5.10。
  • -O3:编译器编译优化级别。
  • -fopenmp:启用OpenMP,适应多处理器环境,更能得到内存带宽实际最大值。开启后,程序默认运行线程为CPU线程数。
  • -DSTREAM_ARRAY_SIZE: 指定测试数组a[]、b[]、c[]的大小(Array size),该值对测试结果影响较大。

由于stream.c源码推荐设置至少4倍最高级缓存(l3 cache),且STREAM_ARRAY为double类型,每个数组元素占用8Byte。推荐的数组大小计算公式如下,结果取整数:

最高级缓存(单位:MB)×1024×1024×4.1×CPU路数/8 或者 最高级缓存(单位:Byte)×4.1倍×CPU路数/8

例如测试机器是双路CPU,最高级缓存32MB,则计算值为32×1024×1024×4.1×2/8≈34393292

  • -fopenmp:启用OpenMP,适应多处理器环境,更能得到内存带宽实际最大值。开启后,程序默认运行线程为CPU线程数。
  • -mcmodel=medium :当单个Memory Array Size 大于2GB时需要设置此参数。还可以改为large、small、tiny等。较新的gcc版本可能不支持small。
  • -o stream.1g.20:输出的可执行文件名,名称自定义。
  • -mtune=native -march=native:针对CPU指令的优化,此处由于编译机即运行机器。故采用native的优化方法。
  • -DOFFSET=4096 :数组的偏移,一般可以不定义。

其他说明:

  • stream 5.9版本数组参数为-DN=2000000形式设置。若为5.10版本,参数名变为-DSTREAM_ARRAY_SIZE,默认值10000000。
  • 要充分考虑内存容量的需求,粗略估计是 STREAM ARRAY_SIZE × 8(双精度) × 3 (三个数组)<= 0.6*M;M 是用户的可用内存。
  • 必须设置测试数组大小远大于CPU 最高级缓存(一般为L3 Cache)的大小,否则就是测试CPU缓存的吞吐性能,而非内存吞吐性能。
  • 为了保证测试可以持续一段时间,测试过程中内存带宽可以达到一定的最大值, 从而避免得不到实际最大峰值的情况,如果四项测试中有完成时间小于20微秒的情况,就需要适当的增大测试数组的维度 STREAM ARRAY_SIZE。

2.2 测试

 

shell

复制代码

# 查看机器cpu的最高级缓存l3 cache为16M,为单路CPU。 $ lscpu | grep -i "L3 cache\|Socket\|core" Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 L3 cache: 16384K # 根据CPU计算DSTREAM_ARRAY_SIZE为16384*1024*4.1*1/8=8,598,323.2,取值1亿进行取值。编译执行文件 $ gcc stream.c -O3 -fopenmp -DSTREAM_ARRAY_SIZE=1024*1024*100 -DNTIMES=20 -mcmodel=medium -o stream.100M $ ./stream.100M ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 104857600 (elements), Offset = 0 (elements) Memory per array = 800.0 MiB (= 0.8 GiB). Total memory required = 2400.0 MiB (= 2.3 GiB). Each kernel will be executed 20 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 4 Number of Threads counted = 4 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 29659 microseconds. (= 29659 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 34701.7 0.066024 0.048347 0.084770 Scale: 39939.1 0.056470 0.042007 0.070259 Add: 41795.4 0.079521 0.060212 0.102223 Triad: 41073.6 0.079864 0.061270 0.102483 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------

注意

DSTREAM_ARRAY_SIZE过大的情况下编译,stream运行需要的内存过大,导致产生段错误(Segmentation fault),出现该情况下可以增大内存或者减小DSTREAM_ARRAY_SIZE。

2.3 结果分析

记录测试结果中的COPY(复制),SCALE(乘法),ADD(加法),TRIAD(混合)数值。测试多次取平均值,横向比较不同机器性能。

stream测试原理: 

一次Add操作需要访问三次内存(两个读操作,一个写操作),Triad操作也需要三次访问内存, Copy和Scale操作需要两次访问内存。单位操作内访问内存次数越多,越能够掩盖访存延迟,带宽越大。

单核Stream测试,影响的因素除了内存控制器能力外,还有Core的ROB、Load/Store对其影响,因此不是单纯的内存带宽性能测试。而多核Stream测试,通过多核同时发出大量内存访问请求,能够更加饱和地访问内存,从而测试到内存带宽的极限性能。

3. sysbench测试内存性能

3.1 sysbench安装

3.1.1 镜像源安装

centos安装sysbench需要配置epel源。

 

shell

复制代码

# CentOS安装 sudo yum -y install sysbench # Ubuntu安装 sudo apt -y install sysbench

3.1.2 源码编译安装
下载编译安装
 

shell

复制代码

$ sudo wget https://github.com/akopytov/sysbench/archive/master.zip $ sudo unzip master.zip $ sudo cd master/ # 或使用国内代码仓库 $ git clone https://gitee.com/mirrors/sysbench.git $ cd sysbench/ $ sudo ./autogen.sh # 如果仅测试内存性能不涉及mysql,添加下面参数。否则编译配置不通过。 $ sudo ./configure --without-mysql # 安装 $ sudo make && sudo make install

参数解读:

 

shell

复制代码

$ sysbench --help Usage: sysbench [options]... [testname] [command] Commands implemented by most tests: prepare run cleanup help General options: --threads=N number of threads to use [1] --events=N limit for total number of events [0] --time=N limit for total execution time in seconds [10] --warmup-time=N execute events for this many seconds with statistics disabled before the actual benchmark run with statistics enabled [0] --forced-shutdown=STRING number of seconds to wait after the --time limit before forcing shutdown, or 'off' to disable [off] --thread-stack-size=SIZE size of stack per thread [64K] --thread-init-timeout=N wait time in seconds for worker threads to initialize [30] --rate=N average transactions rate. 0 for unlimited rate [0] --report-interval=N periodically report intermediate statistics with a specified interval in seconds. 0 disables intermediate reports [0] --report-checkpoints=[LIST,...] dump full statistics and reset all counters at specified points in time. The argument is a list of comma-separated values representing the amount of time in seconds elapsed from start of test when report checkpoint(s) must be performed. Report checkpoints are off by default. [] --debug[=on|off] print more debugging info [off] --validate[=on|off] perform validation checks where possible [off] --help[=on|off] print help and exit [off] --version[=on|off] print version and exit [off] --config-file=FILENAME File containing command line options --luajit-cmd=STRING perform LuaJIT control command. This option is equivalent to 'luajit -j'. See LuaJIT documentation for more information Pseudo-Random Numbers Generator options: --rand-type=STRING random numbers distribution {uniform, gaussian, pareto, zipfian} to use by default [uniform] --rand-seed=N seed for random number generator. When 0, the current time is used as an RNG seed. [0] --rand-pareto-h=N shape parameter for the Pareto distribution [0.2] --rand-zipfian-exp=N shape parameter (exponent, theta) for the Zipfian distribution [0.8] Log options: --verbosity=N verbosity level {5 - debug, 0 - only critical messages} [3] --percentile=N percentile to calculate in latency statistics (1-100). Use the special value of 0 to disable percentile calculations [95] --histogram[=on|off] print latency histogram in report [off] General database options: --db-driver=STRING specifies database driver to use ('help' to get list of available drivers) --db-ps-mode=STRING prepared statements usage mode {auto, disable} [auto] --db-debug[=on|off] print database-specific debug information [off] Compiled-in database drivers: Compiled-in tests: fileio - File I/O test cpu - CPU performance test memory - Memory functions speed test threads - Threads subsystem performance test mutex - Mutex performance test See 'sysbench <testname> help' for a list of options for each test.

3.1.3 测试
 

shell

复制代码

# 查看memory测试帮助信息 $ sysbench memory help sysbench 1.1.0-2ca9e3f (using bundled LuaJIT 2.1.0-beta3) memory options: --memory-block-size=SIZE size of memory block for test [1K] --memory-total-size=SIZE total size of data to transfer [100G] --memory-scope=STRING memory access scope {global,local} [global] --memory-hugetlb[=on|off] allocate memory from HugeTLB pool [off] --memory-oper=STRING type of memory operations {read, write, none} [write] --memory-access-mode=STRING memory access mode {seq,rnd} [seq] # 测试内存读性能。顺序读,读取100G数据,快大小8K。每隔1s打印一次。 $ sysbench memory --threads=4 --time=60 --report-interval=1 --memory-block-size=8K --memory-total-size=100G--memory-oper=read --memory-access-mode=seq run sysbench 1.1.0-2ca9e3f (using bundled LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 4 Report intermediate results every 1 second(s) Initializing random number generator from current time Running memory speed test with the following options: block size: 8KiB total size: 102400MiB operation: write scope: global Initializing worker threads... Threads started! [ 1s ] 7663.72 MiB/sec [ 2s ] 3820.58 MiB/sec [ 3s ] 2627.22 MiB/sec [ 4s ] 2616.21 MiB/sec ... [ 31s ] 2542.26 MiB/sec [ 32s ] 2532.57 MiB/sec [ 33s ] 2474.34 MiB/sec [ 34s ] 2760.10 MiB/sec Total operations: 13107200 (375099.39 per second) 102400.00 MiB transferred (2930.46 MiB/sec) # 读/写的平均速度 Throughput: events/s (eps): 375099.3918 time elapsed: 34.9433s total number of events: 13107200 # # events数,一个event为读/写一个内存块 Latency (ms): min: 0.00 avg: 0.01 max: 16.04 95th percentile: 0.02 sum: 130907.47 Threads fairness: events (avg/stddev): 3276800.0000/0.00 execution time (avg/stddev): 32.7269/0.09 # 测试内存写性能。顺序写,写100G数据,快大小8K。每隔1s打印一次。 $ sysbench memory --threads=4 --time=60 --report-interval=1 --memory-block-size=8K --memory-total-size=100G--memory-oper=write --memory-access-mode=seq run sysbench 1.1.0-2ca9e3f (using bundled LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 4 Report intermediate results every 1 second(s) Initializing random number generator from current time Running memory speed test with the following options: block size: 8KiB total size: 102400MiB operation: write scope: global Initializing worker threads... Threads started! [ 1s ] 2745.40 MiB/sec [ 2s ] 2692.62 MiB/sec [ 3s ] 2712.13 MiB/sec ... [ 28s ] 2806.32 MiB/sec [ 29s ] 2747.49 MiB/sec [ 30s ] 2721.71 MiB/sec [ 31s ] 5733.25 MiB/sec Total operations: 13107200 (420671.73 per second) 102400.00 MiB transferred (3286.50 MiB/sec) Throughput: events/s (eps): 420671.7259 time elapsed: 31.1578s total number of events: 13107200 Latency (ms): min: 0.00 avg: 0.01 max: 20.13 95th percentile: 0.02 sum: 115533.04 Threads fairness: events (avg/stddev): 3276800.0000/0.00 execution time (avg/stddev): 28.8833/0.31

3.1.4 测试结果分析

记录内存读写的平均速度,调整测试参数,多次测试取平均值。

4. memtester测试内存

用于测试内存正确性的实用工具,主要面向硬件开发人员,从4.1.0版本开始,memtester可以指定起始物理内存地址进行测试。

也可以用于构造内存高负载的场景。

 

shell

复制代码

# 下载编译安装 wget https://pyropus.ca./software/memtester/old-versions/memtester-4.6.0.tar.gz $ tar xf memtester-4.6.0.tar.gz $ cd memtester-4.6.0/ $ sudo make && sudo make install # 使用方法 Usage: memtester [-p physaddrbase [-d device]] <mem>[B|K|M|G] [loops] # 给定测试内存的大小和次数, 其测试的主要项目有随机值,异或比较,减法,乘法,除法,与或运算等等。 $ sudo memtester 1G 3 memtester version 4.6.0 (64-bit) Copyright (C) 2001-2020 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 1024MB (1073741824 bytes) got 1024MB (1073741824 bytes), trying mlock ...locked. Loop 1/3: ... Loop 3/3: Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : ok Checkerboard : ok Bit Spread : ok Bit Flip : ok Walking Ones : ok Walking Zeroes : ok 8-bit Writes : ok 16-bit Writes : ok Done.

5. mbw测试内存性能

 

shell

复制代码

# ubuntu安装 $ sudo apt install -y mbw # centos安装 $ sudo git clone https://github.com/raas/mbw.git $ cd mbw $ sudo make # 帮助信息 $ mbw -h Usage: mbw [options] array_size_in_MiB Options: -n: number of runs per test -a: Don't display average -t0: memcpy test # 内存拷贝 -t1: dumb (b[i]=a[i] style) test # 字符串拷贝 -t2 : memcpy test with fixed block size # 内存块拷贝 -b <size>: block size in bytes for -t2 (default: 262144) -q: quiet (print statistics only) (will then use two arrays, watch out for swapping) 'Bandwidth' is amount of data copied over the time this operation took. The default is to run all tests available. # 测试,-q隐藏日志,-n 10运行10次,256M表示测试使用的内存大小 ./mbw -q -n 10 256 0 Method: MEMCPY Elapsed: 0.04187 MiB: 256.00000 Copy: 6114.455 MiB/s 1 Method: MEMCPY Elapsed: 0.04571 MiB: 256.00000 Copy: 5600.525 MiB/s 2 Method: MEMCPY Elapsed: 0.05306 MiB: 256.00000 Copy: 4824.727 MiB/s 3 Method: MEMCPY Elapsed: 0.05574 MiB: 256.00000 Copy: 4592.999 MiB/s 4 Method: MEMCPY Elapsed: 0.06371 MiB: 256.00000 Copy: 4018.460 MiB/s 5 Method: MEMCPY Elapsed: 0.05230 MiB: 256.00000 Copy: 4894.744 MiB/s 6 Method: MEMCPY Elapsed: 0.05222 MiB: 256.00000 Copy: 4902.336 MiB/s 7 Method: MEMCPY Elapsed: 0.05833 MiB: 256.00000 Copy: 4388.446 MiB/s 8 Method: MEMCPY Elapsed: 0.05498 MiB: 256.00000 Copy: 4656.662 MiB/s 9 Method: MEMCPY Elapsed: 0.05776 MiB: 256.00000 Copy: 4431.903 MiB/s AVG Method: MEMCPY Elapsed: 0.05357 MiB: 256.00000 Copy: 4779.017 MiB/s 0 Method: DUMB Elapsed: 0.04523 MiB: 256.00000 Copy: 5659.585 MiB/s 1 Method: DUMB Elapsed: 0.04219 MiB: 256.00000 Copy: 6067.357 MiB/s 2 Method: DUMB Elapsed: 0.03677 MiB: 256.00000 Copy: 6962.197 MiB/s 3 Method: DUMB Elapsed: 0.04211 MiB: 256.00000 Copy: 6078.739 MiB/s 4 Method: DUMB Elapsed: 0.04162 MiB: 256.00000 Copy: 6150.446 MiB/s 5 Method: DUMB Elapsed: 0.04325 MiB: 256.00000 Copy: 5919.075 MiB/s 6 Method: DUMB Elapsed: 0.04290 MiB: 256.00000 Copy: 5966.671 MiB/s 7 Method: DUMB Elapsed: 0.03596 MiB: 256.00000 Copy: 7120.011 MiB/s 8 Method: DUMB Elapsed: 0.03747 MiB: 256.00000 Copy: 6831.950 MiB/s 9 Method: DUMB Elapsed: 0.03587 MiB: 256.00000 Copy: 7137.281 MiB/s AVG Method: DUMB Elapsed: 0.04034 MiB: 256.00000 Copy: 6346.342 MiB/s 0 Method: MCBLOCK Elapsed: 0.03189 MiB: 256.00000 Copy: 8026.336 MiB/s 1 Method: MCBLOCK Elapsed: 0.03841 MiB: 256.00000 Copy: 6664.931 MiB/s 2 Method: MCBLOCK Elapsed: 0.03263 MiB: 256.00000 Copy: 7846.503 MiB/s 3 Method: MCBLOCK Elapsed: 0.03469 MiB: 256.00000 Copy: 7379.648 MiB/s 4 Method: MCBLOCK Elapsed: 0.03270 MiB: 256.00000 Copy: 7828.986 MiB/s 5 Method: MCBLOCK Elapsed: 0.03393 MiB: 256.00000 Copy: 7544.056 MiB/s 6 Method: MCBLOCK Elapsed: 0.03700 MiB: 256.00000 Copy: 6919.293 MiB/s 7 Method: MCBLOCK Elapsed: 0.03924 MiB: 256.00000 Copy: 6523.623 MiB/s 8 Method: MCBLOCK Elapsed: 0.04240 MiB: 256.00000 Copy: 6037.736 MiB/s 9 Method: MCBLOCK Elapsed: 0.03011 MiB: 256.00000 Copy: 8503.288 MiB/s AVG Method: MCBLOCK Elapsed: 0.03530 MiB: 256.00000 Copy: 7252.125 MiB/s # 数值越大性能越好

  • 23
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
性能测试工具是用来评估系统在不同负载条件下的性能表现的工具。下面是一些常见的性能测试工具以及它们的实现方式: 1. Apache JMeter:JMeter是一个开源的Java应用程序,可以用于对Web应用程序、数据库、FTP服务器等进行性能测试。它通过创建测试计划、线程组和采样器来模拟用户行为,并生成测试报告。 2. LoadRunner:LoadRunner是一款商业性能测试工具,可以模拟大量用户并发访问系统。它使用虚拟用户(Vusers)来模拟真实用户的行为,并通过监控系统资源来评估系统的性能。 3. Gatling:Gatling是一个基于Scala编写的开源性能测试工具,它使用异步非阻塞的方式发送请求,可以模拟高并发的负载。Gatling提供了一种简洁的DSL(领域特定语言)来定义测试场景和行为。 4. Tsung:Tsung是一个开源的分布式性能测试工具,可以模拟大量用户并发访问系统。它使用Erlang语言编写,支持多种协议(如HTTP、WebDAV、SOAP等),并提供了丰富的统计数据和报告。 这些性能测试工具的实现方式主要包括以下几个方面: - 模拟用户行为:通过定义测试场景、用户行为和请求参数等来模拟真实用户的操作。 - 发送请求:使用HTTP或其他协议发送请求到被测试系统,并记录响应时间、吞吐量等性能指标。 - 监控系统资源:监控被测试系统的CPU、内存、网络等资源的使用情况,以评估系统的性能瓶颈。 - 生成报告:根据测试结果生成详细的性能报告,包括响应时间分布、错误率、吞吐量等指标,以及性能瓶颈的定位和建议。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值