How to benchmark disk I/O

So you have purchased a new VPS (whether it is with Binary Lane or another provider), logged in with SSH and are now staring at your root shell. For many of us, the first question that comes to mind is How fast is my server?; followed quickly by How do I measure its performance?


In this article I will look at some specific methods of measuring the disk performance of your VPS.


What not to do


Chances are, you have seen this test before; perhaps even used it yourself. It is the obligatory dd test - here is one popular variety:


dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync


This test is popular because dd is pre-installed on almost all Linux servers. While it is a simple way of seeing if something is broken (for example, if you see 10MB/sec than your server is overloaded) it has a number of problems:

  • This is a single-threaded, sequential-write test. If you are running the typical web+database server on your VPS, the number is meaningless because typical services do not do long-running sequential writes.
  • The amount of data written (1GB) is small; and hence can be strongly influenced by caching on the host server, or the host's RAID controller. (The conv=fdatasync only applies to the VPS, not the host).
  • It executes for a very short period of time; just a few seconds on faster I/O subsystems. This isn't enough to get a consistant result.
  • There's no read performance testing at all.

So if you accept my argument that  dd is not a great way to benchmark disk performance, let's walk through a few tools that I think are worth using:

Measuring random IOPS with FIO


When running a website or similar workload, in general the best measurement of the disk subsystem is known as IOPS: Input/Output Operations per Second. In particular, the specific test we want to do is:
  1. Random reads, random writes, or a combination of both. Databases in particular will pull data from all over your disk - known as random access.
  2. 4 kilobyte blocks. Again, databases and many other programs will read very small chunks of data - 4 kilobytes is a good working estimate.
  3. Multiple threads. If your website has multiple visitors, your website will serve them all at the same time. We want our benchmark to simulate this behaviour of multiple things accessing the disk at once.
FIO is a popular tool for measuring IOPS on a Linux server. Its very configurable (perhaps even to its detriment) but with the following Bash snippets is easy enough to use.  To start with, here is how to download and compile it - just paste straight into the root shell of your CentOS/Debian/Ubuntu server:

  

cd /root
yum install -y make gcc libaio-devel || ( apt-get update && apt-get install -y make gcc libaio-dev  </dev/null )
wget https://github.com/Crowd9/Benchmark/raw/master/fio-2.0.9.tar.gz ; tar xf fio*
cd fio*
make

  

With FIO compiled, we can now run some tests.


Random read/write performance


If you simply want a way to compare different provider's disk performance, then this is the test I suggest. Run the following command:

   

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

   

This will create a 4 GB file, and perform 4KB reads and writes using a 75%/25% (ie 3 reads are performed for every 1 write) split within the file, with 64 operations running at a time. The 3:1 ratio is a rough approximation of your typical database.


FIO will now tick along printing a summary as it goes, that looks like this:

  

Jobs: 1 (f=1): [m] [6.5% done] [39613K/13099K /s] [9903 /3274  iops] [eta 01m:12s]
 

  

And eventually a full result output like this, with the numbers we want highlighted in yellow:


fio-2.0.9

Starting 1 process

Jobs: 1 (f=1): [m] [100.0% done] [43496K/14671K /s] [10.9K/3667 iops] [eta 00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=31214: Fri May 9 16:01:53 2014

  read : io=3071.1MB, bw=39492KB/s, iops=9873 , runt= 79653msec

  write: io=1024.7MB, bw=13165KB/s, iops=3291 , runt= 79653msec

  cpu : usr=16.26%, sys=71.94%, ctx=25916, majf=0, minf=25

  IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%

     submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%

     issued : total=r=786416/w=262160/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):

   READ: io=3071.1MB, aggrb=39492KB/s, minb=39492KB/s, maxb=39492KB/s, mint=79653msec, maxt=79653msec

  WRITE: io=1024.7MB, aggrb=13165KB/s, minb=13165KB/s, maxb=13165KB/s, mint=79653msec, maxt=79653msec

Disk stats (read/write):

  vda: ios=786003/262081, merge=0/22, ticks=3883392/667236, in_queue=4550412, util=99.97%


This tests shows:

  • Binary Lane's network SSD performing 9873 read operations per second and 3291 write operations per second. 
  • A VPS using local SSD might reach 40,000 and 10,000 respectively if the system is lightly loaded.
  • A VPS using local non-SSD will probably get somewhere around 500 read / 200 write.

If you want to know what contributed to the above numbers, read on -

Random read performance


To measure random reads, use slightly altered FIO command:

 

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread

  

Again, we pull the iops from the result:


fio-2.0.9

Starting 1 process

Jobs: 1 (f=1): [r] [100.0% done] [62135K/0K /s] [15.6K/0 iops] [eta 00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=31181: Fri May 9 15:38:57 2014

  read : io=1024.0MB, bw=62748KB/s, iops=15686 , runt= 16711msec

  cpu : usr=5.94%, sys=90.13%, ctx=1885, majf=0, minf=89

  IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%

     submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%

     issued : total=r=262144/w=0/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):

   READ: io=1024.0MB, aggrb=62747KB/s, minb=62747KB/s, maxb=62747KB/s, mint=16711msec, maxt=16711msec

Disk stats (read/write):

  vda: ios=259063/2, merge=0/1, ticks=951356/20, in_queue=951308, util=96.83%


This tests shows Binary Lane's network storage scoring 15686 read operations per second. By comparison, a local SSD may give 50,0000 while a good non-SSD may give around 2000.


Random write performance


Again, just modify the FIO command slightly so we perform randwrite instead of randread:


./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

 

Again, we pull the iops from the result:


fio-2.0.9

Starting 1 process

Jobs: 1 (f=1): [w] [100.0% done] [0K/26326K /s] [0 /6581 iops] [eta 00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=31235: Fri May 9 16:16:21 2014

  write: io=1024.0MB, bw=29195KB/s, iops=7298 , runt= 35916msec

  cpu : usr=77.42%, sys=13.74%, ctx=2306, majf=0, minf=24

  IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%

     submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%

     issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):

  WRITE: io=1024.0MB, aggrb=29195KB/s, minb=29195KB/s, maxb=29195KB/s, mint=35916msec, maxt=35916msec

Disk stats (read/write):

  vda: ios=0/260938, merge=0/11, ticks=0/2315104, in_queue=2316372, util=98.87%


This tests shows Binary Lane's network storage scoring 7298 write operations per second. By comparison, a local SSD may give 50,0000 while a good non-SSD may give around 2000.


Measuring latency with IOPing


The final aspect of evaluating disk performance is to measure the latency on individual requests. One way to do so is to install ioping using the following command:

 

cd /root
yum install -y make gcc libaio-devel || ( apt-get update && apt-get install -y make gcc libaio-dev  </dev/null )
wget https://ioping.googlecode.com/files/ioping-0.6.tar.gz ; tar xf ioping*
cd ioping*
make

 

We can now run ioping to get an idea of the per-request latency:


 

HTML
./ioping -c 10 .

 

4096 bytes from . (ext3 /dev/vda1): request=1 time=0.2 ms 

4096 bytes from . (ext3 /dev/vda1): request=2 time=0.3 ms

4096 bytes from . (ext3 /dev/vda1): request=3 time=0.3 ms

4096 bytes from . (ext3 /dev/vda1): request=4 time=0.3 ms

4096 bytes from . (ext3 /dev/vda1): request=5 time=0.3 ms

4096 bytes from . (ext3 /dev/vda1): request=6 time=0.3 ms

4096 bytes from . (ext3 /dev/vda1): request=7 time=0.3 ms

4096 bytes from . (ext3 /dev/vda1): request=8 time=0.2 ms

4096 bytes from . (ext3 /dev/vda1): request=9 time=0.4 ms

4096 bytes from . (ext3 /dev/vda1): request=10 time=0.3 ms

--- . (ext3 /dev/vda1) ioping statistics ---

10 requests completed in 9006.8 ms, 3505 iops, 13.7 mb/s

min/avg/max/mdev = 0.2/0.3/0.4/0.1 ms


Here you can see the average was 0.3ms.  On a healthy system, you should see relatively low variation and an average below 1.0 ms.


Final Thoughts


I hope this article has added a few more tools to your toolbelt and if nothing else, helped to dispell the idea that dd should be used for benchmarking.


Of course this guide only touches the surface of the volume of software available for disk benchmarking. If you have your own favourite tool that you think I should have covered, please send us a tweet: @BinaryLane.

Cross-compiling Parsec benchmark for ARM architecture involves setting up a build environment on your host machine (likely x86 or x64) that can generate code for ARM targets. Here's a general outline of the steps: 1. **Install required tools**: Ensure you have installed a compiler and toolchain that supports ARM, like `gcc-arm-none-eabi` or `clang-arm`. 2. **Set up target environment**: You'll need to configure the toolchain with the correct paths and settings for the ARM target. This may involve creating a separate configuration file or specifying flags when invoking the compiler. ```sh export CC=arm-none-eabi-gcc export CXX=arm-none-eabi-g++ ``` 3. **Download Parsec**: Download the latest version of Parsec, which is a benchmarking suite, and extract it to a suitable location. 4. **Configure Makefile**: If Parsec has a Makefile, modify it to specify the target architecture. Look for lines similar to `CC`, `CFLAGS`, or `LDFLAGS`. Add or update them as needed, using the ARM-specific versions. 5. **Build for ARM**: Run `make clean && make` in the Parsec directory, replacing `make` with the appropriate command for your build system (e.g., `ninja` if you're using Ninja). ```sh make TARGET=arm-linux-gnueabihf ``` 6. **Test on ARM device or emulator**: Copy the generated `.elf` (executable) file to an ARM device or use an emulator to run it. **Related questions:** 1. What specific version of the ARM toolchain do I need? 2. How can I deal with dependencies if they don't have ARM builds available? 3. Are there any libraries or patches I need to include for compatibility with Parsec on ARM?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值