


unixbench一个基于系统的基准测试工具,不单纯是CPU 内存 或者磁盘测试工具。测试结果不仅仅取决于硬件,也取决于系统、开发库、甚至是编译器。




Whetstone 测试

这项测试项目用于测试浮点运算效率和速度。这项测试项目包含若干个科学计算的典型性能模块,包含大量的C语言函数,sin cos sqrt exp和日志以及使用整数和浮点的数学操作。包含数组访问、条件分支和过程调用。

Execl Throughput(execl 吞吐,这里的execl是类unix系统非常重要的函数,非办公软件的execl)测试

这项测试测试每秒execl函数调用次数。execl是 exec函数家族的一部分,使用新的图形处理代替当前的图形处理。有许多命令和前端的execve()函数命令非常相似。

File Copy测试


Pipe Throughput(管道吞吐)测试



Pipe-based Context Switching (基于管道的上下文交互)测试


Process Creation(进程创建)测试


Shell Scripts测试

shell脚本测试用于衡量在一分钟内,一个进程可以启动并停止shell脚本的次数,通常会测试1,2, 3, 4, 8 个shell脚本的共同拷贝,shell脚本是一套转化数据文件的脚本。

System Call Overhead (系统调用消耗)测试


Graphical Tests(图形)测试



wget http://soft.vpser.net/test/unixbench/unixbench-5.1.2.tar.gz
tar zxvf unixbench-5.1.2.tar.gz
cd unixbench-5.1.2

阅读README文件,得知如果不需要进行图形测试或者不在图形化界面下测试,则将Makefile文件中GRAPHICS_TEST = defined注释掉




今天有在金山云服务器跑分的时候出现”Can’t locate Time/HiRes.pm in @INC”错误提示无法进行,检测是出现缺少perl Time HiRes组件造成的,并不是所有的UnixBench跑分的时候都会遇到这样的问题。


【yum -y install perl-Time-HiRes】



gcc -o ./pgms/ubgears -DTIME -Wall -pedantic -ansi -O2 -fomit-frame-pointer -fforce-addr -ffast-math -Wall ./src/ubgears.c -lGL -lXext -lX11
./src/ubgears.c:51:19: error: GL/gl.h: No such file or directory
./src/ubgears.c:52:20: error: GL/glx.h: No such file or directory
./src/ubgears.c:129: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'view_rotx'
./src/ubgears.c:632: error: 'GL_RENDERER' undeclared (first use in this function)
./src/ubgears.c:633: error: 'GL_VERSION' undeclared (first use in this function)
./src/ubgears.c:634: error: 'GL_VENDOR' undeclared (first use in this function)
./src/ubgears.c:635: error: 'GL_EXTENSIONS' undeclared (first use in this function)
./src/ubgears.c:643: warning: implicit declaration of function 'glXDestroyContext'
make: *** [pgms/ubgears] Error 1</p><p>**********************************************
Run: "make all" failed; aborting


apt-get install libxext-dev libgl1-mesa-dev
通过查阅资料,由于ubgears.c中会用到数学函数,而实际运行时找不到对应的数学函数,只需要在显示调用函数函数库即可,在Makefile中GL_LIBS 后添加-lm


看到run文件后,输入 ./Run 执行命令对VPS进行性能测试就开始了,最后跑完将会有一个分数在底部出现。通常情况下1000分以上的VPS是性能较好的。



   BYTE UNIX Benchmarks (Version 5.1.2)

   System: VM-0-8-ubuntu: GNU/Linux
   OS: GNU/Linux -- 4.4.0-91-generic -- #114-Ubuntu SMP Tue Aug 8 11:56:56 UTC 2017
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 2: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 3: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 4: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 5: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 6: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 7: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   11:26:56 up 22 min,  1 user,  load average: 0.07, 0.07, 0.17; runlevel 5

Benchmark Run: Mon Apr 16 2018 11:26:56 - 11:55:17
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       33444509.7 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     2702.2 MWIPS (10.0 s, 7 samples)
Execl Throughput                               4647.2 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1131210.2 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          306139.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3477545.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2197189.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 159896.2 lps   (10.0 s, 7 samples)
Process Creation                              11912.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  12619.4 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   5086.8 lpm   (60.0 s, 2 samples)
System Call Overhead                        3928781.6 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   33444509.7   2865.9
Double-Precision Whetstone                       55.0       2702.2    491.3
Execl Throughput                                 43.0       4647.2   1080.7
File Copy 1024 bufsize 2000 maxblocks          3960.0    1131210.2   2856.6
File Copy 256 bufsize 500 maxblocks            1655.0     306139.4   1849.8
File Copy 4096 bufsize 8000 maxblocks          5800.0    3477545.6   5995.8
Pipe Throughput                               12440.0    2197189.4   1766.2
Pipe-based Context Switching                   4000.0     159896.2    399.7
Process Creation                                126.0      11912.9    945.5
Shell Scripts (1 concurrent)                     42.4      12619.4   2976.3
Shell Scripts (8 concurrent)                      6.0       5086.8   8478.1
System Call Overhead                          15000.0    3928781.6   2619.2
System Benchmarks Index Score                                        1893.7

Benchmark Run: Mon Apr 16 2018 11:55:17 - 12:23:39
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables      263391605.6 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    21623.4 MWIPS (10.0 s, 7 samples)
Execl Throughput                              32726.1 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1117467.1 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          304340.2 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3570594.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                            17497194.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                1783119.9 lps   (10.0 s, 7 samples)
Process Creation                              58313.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  60188.9 lpm   (60.2 s, 2 samples)
Shell Scripts (8 concurrent)                   8246.3 lpm   (60.2 s, 2 samples)
System Call Overhead                        6898602.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  263391605.6  22570.0
Double-Precision Whetstone                       55.0      21623.4   3931.5
Execl Throughput                                 43.0      32726.1   7610.7
File Copy 1024 bufsize 2000 maxblocks          3960.0    1117467.1   2821.9
File Copy 256 bufsize 500 maxblocks            1655.0     304340.2   1838.9
File Copy 4096 bufsize 8000 maxblocks          5800.0    3570594.5   6156.2
Pipe Throughput                               12440.0   17497194.7  14065.3
Pipe-based Context Switching                   4000.0    1783119.9   4457.8
Process Creation                                126.0      58313.8   4628.1
Shell Scripts (1 concurrent)                     42.4      60188.9  14195.5
Shell Scripts (8 concurrent)                      6.0       8246.3  13743.8
System Call Overhead                          15000.0    6898602.7   4599.1
System Benchmarks Index Score                                        6493.2

注:上面会有两个跑分结果,一个是 1 parallel process 的结果,另一个是4 parallel process 的结果 (具体可以看html里的输出)。两者的区别即一个是单进程跑,一个是多进程跑 。


root@VM-16-16-ubuntu:/home/ubuntu/unixbench-5.1.2/results# pwd
root@VM-16-16-ubuntu:/home/ubuntu/unixbench-5.1.2/results# ls
VM-16-16-ubuntu-2018-04-16-01  VM-16-16-ubuntu-2018-04-16-01.html  VM-16-16-ubuntu-2018-04-16-01.log


测试过程中每个项目后面会有1 2 3 4 5 6 7 8 9 10 数字,意思是进行了10组测试,测试过程中部分内容及解释如下:

  1. *************
    Dhrystone 2 using register variables 1 2 3 4 5 6 7 8 9 10
    此项产生于 1984,测试 string handling,因为没有浮点操作,所以深受软件和硬件设计(hardware and software design)、编译和链接(compiler and linker options)、代码优化(code optimazaton)、对内存的cache(cache memory)、等待状态(?wait states)整数数据类型(integer data types)的影响
  2. **********
    Double-Precision Whetstone 1 2 3 4 5 6 7 8 9 10
    这一项测试浮点数操作的速度和效率。这一测试包括几个模块,每个模块都包括一组用于科学计算的操作。覆盖面很广的一系列c函数:sin,cos,sqrt,exp,log 被用于整数和浮点数的数学运算、数组访问、条件分支(conditional branch)和程序调用。此测试同时测试了整数和浮点数算术运算。

  3. System Call Overhead 1 2 3 4 5 6 7 8 9 10
    测试进入和离开操作系统内核的代价,即一次系统调用的代价。它利用一个反复地调用 getpid 函数的小程序达到此目的。

  4. Pipe Throughput 1 2 3 4 5 6 7 8 9 10
    管道(pipe)是进程间交流的最简单方式,这里的 Pipe throughtput 指的是一秒钟内一个进程可以向一个管道写 512 字节数据然后再读回的次数。需要注意的是,pipe throughtput 在实际编程中没有对应的真实存在。

  5. Pipe-based Context Switching 1 2 3 4 5 6 7 8 9 10

  6. Process Creation 1 2 3
    测试每秒钟一个进程可以创建子进程然后收回子进程的次数(子进程一定立即退出)。process creation 的关注点是新进程进程控制块(process control block)的创建和内存分配,即一针见血地关注内存带宽。一般说来,这个测试被用于对操作系统进程创建这一系统调用的不同实现的比较。

  7. Execl Throughput 1 2 3
    此测试考察每秒钟可以执行的 execl 系统调用的次数。 execl 系统调用是 exec 函数族的一员。它和其他一些与之相似的命令一样是 execve() 函数的前端。

  8. File copy
    测试从一个文件向另外一个文件传输数据的速率。每次测试使用不同大小的缓冲区。这一针对文件 read、write、copy 操作的测试统计规定时间(默认是 10s)内的文件 read、write、copy 操作次数。

Filesystem Throughput 1024 bufsize 2000 maxblocks 1 2 3

Filesystem Throughput 256 bufsize 500 maxblocks 1 2 3

Filesystem Throughput 4096 bufsize 8000 maxblocks 1 2 3

  1. ****
    Shell Scripts
    测试一秒钟内一个进程可以并发地开始一个 shell 脚本的 n 个拷贝的次数,n 一般取值 1,2,4,8.(我的系统上取 1, 8, 16)。这个脚本对一个数据文件进行一系列的变形操作(?transformation)。

Shell Scripts (1 concurrent) 1 2 3
Shell Scripts (8 concurrent) 1 2 3
Shell Scripts (16 concurrent) 1 2 3

【Run -c 1 -c 4】表示执行两次,第一次单个copies,第二次4个copies的测试任务。


【System Benchmarks Index Score 171.3】

【System Benchmarks Index Score 395.7】



root@VM-0-15-ubuntu:/home/ubuntu# apt install -y mbw
root@VM-0-15-ubuntu:/home/ubuntu# mbw -q -n 10 256
Long uses 8 bytes. Allocating 2*4194304 elements = 67108864 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0       Method: MEMCPY  Elapsed: 0.00646        MiB: 32.00000   Copy: 4955.094 MiB/s
1       Method: MEMCPY  Elapsed: 0.00662        MiB: 32.00000   Copy: 4833.107 MiB/s
2       Method: MEMCPY  Elapsed: 0.00655        MiB: 32.00000   Copy: 4882.514 MiB/s
3       Method: MEMCPY  Elapsed: 0.00652        MiB: 32.00000   Copy: 4910.988 MiB/s
4       Method: MEMCPY  Elapsed: 0.00683        MiB: 32.00000   Copy: 4685.898 MiB/s
5       Method: MEMCPY  Elapsed: 0.00651        MiB: 32.00000   Copy: 4918.537 MiB/s
6       Method: MEMCPY  Elapsed: 0.00652        MiB: 32.00000   Copy: 4909.481 MiB/s
7       Method: MEMCPY  Elapsed: 0.00654        MiB: 32.00000   Copy: 4891.470 MiB/s
8       Method: MEMCPY  Elapsed: 0.00657        MiB: 32.00000   Copy: 4870.624 MiB/s
9       Method: MEMCPY  Elapsed: 0.00653        MiB: 32.00000   Copy: 4901.961 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.00656        MiB: 32.00000   Copy: 4874.928 MiB/s
0       Method: DUMB    Elapsed: 0.00400        MiB: 32.00000   Copy: 8004.002 MiB/s
1       Method: DUMB    Elapsed: 0.00278        MiB: 32.00000   Copy: 11510.791 MiB/s
2       Method: DUMB    Elapsed: 0.00280        MiB: 32.00000   Copy: 11444.921 MiB/s
3       Method: DUMB    Elapsed: 0.00287        MiB: 32.00000   Copy: 11145.942 MiB/s
4       Method: DUMB    Elapsed: 0.00286        MiB: 32.00000   Copy: 11180.992 MiB/s
5       Method: DUMB    Elapsed: 0.00290        MiB: 32.00000   Copy: 11045.910 MiB/s
6       Method: DUMB    Elapsed: 0.00286        MiB: 32.00000   Copy: 11192.725 MiB/s
7       Method: DUMB    Elapsed: 0.00278        MiB: 32.00000   Copy: 11527.378 MiB/s
8       Method: DUMB    Elapsed: 0.00277        MiB: 32.00000   Copy: 11569.053 MiB/s
9       Method: DUMB    Elapsed: 0.00278        MiB: 32.00000   Copy: 11527.378 MiB/s
AVG     Method: DUMB    Elapsed: 0.00294        MiB: 32.00000   Copy: 10891.392 MiB/s
0       Method: MCBLOCK Elapsed: 0.00585        MiB: 32.00000   Copy: 5465.414 MiB/s
1       Method: MCBLOCK Elapsed: 0.00369        MiB: 32.00000   Copy: 8674.438 MiB/s
2       Method: MCBLOCK Elapsed: 0.00294        MiB: 32.00000   Copy: 10902.896 MiB/s
3       Method: MCBLOCK Elapsed: 0.00284        MiB: 32.00000   Copy: 11275.546 MiB/s
4       Method: MCBLOCK Elapsed: 0.00283        MiB: 32.00000   Copy: 11299.435 MiB/s
5       Method: MCBLOCK Elapsed: 0.00264        MiB: 32.00000   Copy: 12107.454 MiB/s
6       Method: MCBLOCK Elapsed: 0.00270        MiB: 32.00000   Copy: 11847.464 MiB/s
7       Method: MCBLOCK Elapsed: 0.00283        MiB: 32.00000   Copy: 11311.417 MiB/s
8       Method: MCBLOCK Elapsed: 0.00273        MiB: 32.00000   Copy: 11717.320 MiB/s
9       Method: MCBLOCK Elapsed: 0.00271        MiB: 32.00000   Copy: 11808.118 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.00318        MiB: 32.00000   Copy: 10074.615 MiB/s



git clone http://github.com/raas/mbw
cd mbw
./ mbw -q -n 10 256

-q 隐藏日志
-n 测试次数
256 内存大小(单位是M)




环境:centos6.8 64位

git clone https://github.com/jeffhammond/STREAM.git
gcc -O -fopenmp -DSTREAM_ARRAY_SIZE=100000000 -DNTIME=20 stream.c -o stream

重要编译参数调节:STREAM_ARRAY_SIZE 调节array大小,设置方法100M的方法:
gcc -O -DSTREAM_ARRAY_SIZE=100000000
stream.c -o stream.100M
NTIMES 调节stream在每个kernel的运行次数,输出最好的一次。
多核情况下,通过 -O -fopenmp 增加多核OpenMP支持
完整示例:gcc -O -fopenmp -DSTREAM_ARRAY_SIZE=100000000
-DNTIME=20 stream.c -o stream

[root@vm192-168-80-2 STREAM]# ls
HISTORY.txt  LICENSE.txt  Makefile  mysecond.c  README  stream  stream.c  stream.f
    [root@vm192-168-80-2 STREAM]# ./stream 
STREAM version $Revision: 5.10 $
This system uses 8 bytes per array element.
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
Number of Threads requested = 8
Number of Threads counted = 8
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 30910 microseconds.
   (= 30910 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           36977.8     0.044884     0.043269     0.052954
Scale:          36797.8     0.044087     0.043481     0.044937
Add:            41868.7     0.058432     0.057322     0.060968
Triad:          42085.3     0.058550     0.057027     0.060215
Solution Validates: avg error less than 1.000000e-13 on all three arrays

Triad :将以上三个组合起来,在本测试中表示的意思是将Copy、Scale、Add
对其进行乘加混合运算(a + 因子 * b ) ,将运算结果写入到另一个内存单元。

