Lmbench
About
lmbench is a suite of simple, portable, ANSI/C microbenchmarks for UNIX/POSIX. In general, it measures two key features: latency and bandwidth. lmbench is intended to give system developers insight into basic costs of key operations. Supports-
- Bandwidth benchmarks
- Cached file read
- Memory copy (bcopy)
- Memory read
- Memory write
- Pipe
- TCP
- Latency benchmarks
- Context switching.
- Networking: connection establishment, pipe, TCP, UDP, and RPC hot potato
- File system creates and deletes.
- Process creation.
- Signal handling
- System call overhead
- Memory read latency
- Miscellanious
- Processor clock rate calculation
Visit lmbench web page for more information.
Source Download Location
- Visit Download lmbench
Cross compiling
- Cross compilation command - make CC=$(TOOL_CHAIN_PREFIX)-gcc
- The TOOL_CHAIN_PREFIX corresponds to the tool chain in use. Set this based on your tool chain. Also make path to toolchain is exported as part of $PATH.
Test setup
- EVM booted up with NFS configuration.
Execution with logs
1.BANDWIDTH MEASUREMENTS ---------------------------------------- bw_file_rd -------------- bw_file_rd times the read of the specified file in 64KB blocks. Results are reported in megabytes read per second. The data is not accessed in the user program; the benchmark relies on the operating systems read interface to have actually moved the data. The size specification may end with ``k'' or ``m'' to mean kilobytes (* 1024) or megabytes (* 1024 * 1024). ./bw_file_rd 7M open2close ../new.ppt 7.00 154.21 The above command benchmarks file read performance. 7M is the size to read, open2close means the performance measurement includes profiling open/close as well. Otherwise we can give ioonly as below- [root@beagleboard arm-none-linux-gnueabi]# ./bw_file_rd 7M io_only ../new.ppt 7.00 155.41 The performance reported is in Megabytes/sec. bw_mem -------------- [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M rd 1.00 244.56 bw_mem rd allocates the specified amount of memory, zeros it, and then times the reading of that memory. [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M wr 1.00 432.77 allocates the specified amount of memory, zeros it, and then times the writing of that memory as a series of 4 byte integer stores and increments. [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M rdwr 1.00 208.38 [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M cp 1.00 205.25 [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M fwr 1.00 433.46 [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M frd 1.00 235.77 [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M fcp 1.00 192.34 [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M bzero 1.00 430.71 [root@beagleboard arm-none-linux-gnueabi]# ./bw_mem 1M bcopy 1.00 189.02 bw_mmap_rd ------------------ bw_mmap_rd creates a memory mapping to the file and then reads the mapping [root@beagleboard arm-none-linux-gnueabi]# ./bw_mmap_rd 1M open2close ../new.ppt 1.00 185.85 [root@beagleboard arm-none-linux-gnueabi]# ./bw_mmap_rd 1M mmap_only ../new.ppt 1.00 237.99 bw_pipe ------------------- bw_pipe creates a Unix pipe between two processes and moves 50MB through the pipe in 64KB chunks [root@beagleboard arm-none-linux-gnueabi]# ./bw_pipe Pipe bandwidth: 119.08 MB/sec bw_tcp ------------ bw_tcp is a client/server program that moves data over a TCP/IP socket. Nothing is done with the data on either side; the data is moved in 48KB chunks. [root@beagleboard arm-none-linux-gnueabi]# ./bw_tcp localhost 0.065536 81.71 MB/sec bw_unix ----------------- bw_unix streams mesaures performance of data sockets [root@beagleboard arm-none-linux-gnueabi]# ./bw_unix AF_UNIX sock stream bandwidth: 120.00 MB/sec 2.LATENCY MEASUREMENTS ----------------------------------------- lat_cmd ------------ Measures command latency [root@beagleboard arm-none-linux-gnueabi]# ./lat_cmd ls lat_cmd: 599.5294 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_cmd ps lat_cmd: 2624.3333 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_cmd cp ../usbtree.txt ../new.tx t lat_cmd: 4852.0000 microseconds lat_connect ----------------- Measures interprocess connection latencies. The benchmark times the creation and connection of an AF_INET (aka TCP/IP) socket to a server. [root@beagleboard arm-none-linux-gnueabi]# ./lat_connect localhost TCP/IP connection cost to localhost: 92.2975 microseconds lat_ctx ---------------------- Measures context switching time for any reasonable number of processes of any reasonable size. The format is multi line, the first line is a title that specifies the size and non-context switching overhead of the test. Each subsequent line is a pair of numbers that indicates the number of processes and the cost of a context switch [root@beagleboard arm-none-linux-gnueabi]# ./lat_ctx -s 128K processes 2 "size=128k ovr=191.70 2 118.50 [root@beagleboard arm-none-linux-gnueabi]# ./lat_ctx -s 128K processes 4 "size=128k ovr=234.81 4 400.60 [root@beagleboard arm-none-linux-gnueabi]# ./lat_dram_page -M 1M 60.517793 lat_fcntl ------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_fcntl Fcntl lock latency: 6.7451 microseconds lat_fs ------------- lat_fs is a program that creates a number of small files in the current working directory and then removes the files. Both the creation and removal of the files is timed. [root@beagleboard arm-none-linux-gnueabi]# ./lat_fs 0k 67 4968 16342 1k 50 4016 9992 4k 42 3792 10021 10k 30 2770 7453 The results are in terms of creates per second and deletes per second as a function of file size. lat_mem_rd ---------------------- [root@beagleboard measures memory read latency for varying memory sizes and strides. The results are reported in nanoseconds per load. [arm-none-linux-gnueabi]# ./lat_mem_rd 1M "stride=128 0.00049 6.276 0.00098 6.597 0.00195 6.194 0.00293 7.013 0.00391 6.202 0.00586 6.280 0.00781 6.314 0.01172 6.192 0.01562 6.313 0.02344 38.590 0.03125 46.596 0.04688 52.612 0.06250 54.662 0.09375 57.368 0.12500 63.414 0.18750 78.652 0.25000 104.024 0.37500 178.904 0.50000 226.348 0.75000 252.026 1.00000 259.944 lat_mmap --------------- times how fast a mapping can be made and unmade [root@beagleboard arm-none-linux-gnueabi]# ./lat_mmap 1M ../new.ppt 1.000000 99 Result-Megabytes, usecs lat_ops --------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_ops integer bit: 2.06 nanoseconds integer add: 3.19 nanoseconds integer mul: 1.25 nanoseconds integer div: 119.11 nanoseconds integer mod: 45.14 nanoseconds int64 bit: 2.11 nanoseconds uint64 add: 2.59 nanoseconds int64 mul: 2.69 nanoseconds int64 div: 543.74 nanoseconds int64 mod: 403.95 nanoseconds float add: 41.92 nanoseconds float mul: 33.08 nanoseconds float div: 172.20 nanoseconds double add: 62.99 nanoseconds double mul: 51.67 nanoseconds double div: 931.12 nanoseconds float bogomflops: 358.26 nanoseconds double bogomflops: 1271.33 nanoseconds lat_pipe ------------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_pipe Pipe latency: 30.9000 microseconds lat_pagefault --------------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_pagefault ../new.ppt Pagefaults on ../new.ppt: 4.3572 microseconds lat_proc --------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_proc fork Process fork+exit: 958.5833 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_proc exec Process fork+execve: 1149.6545 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_proc shell Process fork+/bin/sh -c: 9566.5000 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_proc procedure Procedure call: 0.0349 microseconds lat_rand -------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_rand drand48 latency: 397.43 nanoseconds lrand48 latency: 156.03 nanoseconds lat_tcp --------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_select tcp Select on 200 tcp fd's: 74.4737 microseconds lat_sem ------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_sem Semaphore latency: 7.5261 microseconds lat_sig -------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_sig install Signal handler installation: 1.7298 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_sig catch Signal handler overhead: 5.0362 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_sig prot Usage: ./lat_sig [-P <parallelism>] [-W <warmup>] [-N <repetitions>] install|cat ch|prot [file] [root@beagleboard arm-none-linux-gnueabi]# ./lat_sig prot ../new.ppt Protection fault: 1.0195 microseconds lat_syscall ------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_syscall fstat ../new.ppt Simple fstat: 1.8831 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_syscall open ../new.ppt Simple open/close: 9.2837 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_syscall stat ../new.ppt Simple stat: 5.4106 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_syscall write ../new.ppt Simple write: 0.9867 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_syscall read ../new.ppt Simple read: 1.1014 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_syscall null ../new.ppt [root@beagleboard arm-none-linux-gnueabi]# ./lat_syscall null Simple syscall: 0.5209 microseconds lat_tcp ----------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_tcp localhost TCP latency using localhost: 1.8949 microseconds lat_udp -------------- root@beagleboard arm-none-linux-gnueabi]# ./lat_udp localhost UDP latency using localhost: 67.1474 microseconds lat_unix --------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_unix AF_UNIX sock stream latency: 47.2522 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_unix_connect connect: No such file or directory lat_usleep -------------- [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep -u usleep 100 usleep 100 microseconds: 585.6283 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep -u nanosleep 100 nanosleep 100 microseconds: 575.0153 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep -u select 100 select 100 microseconds: 7781.5000 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep -u pselect 100 Usage: ./lat_usleep [-r] [-u <method>] [-P <parallelism>] [-W <warmup>] [-N <rep etitions>] usecs method=usleep|nanosleep|select|pselect|itimer [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep -u itimer 100 itimer 100 microseconds: 1558.2442 microseconds [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep -u pselect 100 Usage: ./lat_usleep [-r] [-u <method>] [-P <parallelism>] [-W <warmup>] [-N <rep etitions>] usecs method=usleep|nanosleep|select|pselect|itimer [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep -u pselect Usage: ./lat_usleep [-r] [-u <method>] [-P <parallelism>] [-W <warmup>] [-N <rep etitions>] usecs method=usleep|nanosleep|select|pselect|itimer [root@beagleboard arm-none-linux-gnueabi]# ./lat_usleep pselect usleep 0 microseconds: 4.9160 microseconds 3.OTHER MEASUREMENTS ------------------------------------- cache --------------- cache first attempts to determine the number and size of caches by measuring the memory latency for various memory sizes. [root@beagleboard arm-none-linux-gnueabi]# ./cache -M 128K Memory latency: 6.16 nanoseconds 2.99 parallelism line ---------- [root@beagleboard arm-none-linux-gnueabi]# ./line 64 lmdd ------------ [root@beagleboard arm-none-linux-gnueabi]# ./lmdd if=internal of=/tmp/file count =1000 fsync=1 8.1920 MB in 3.4707 secs, 2.3603 MB/sec [root@beagleboard arm-none-linux-gnueabi]# ./lmdd if=/tmp/usbtree.txt of=/tmp/fi le count=1000 fsync=1 0.0047 MB in 0.1649 secs, 0.0284 MB/sec [root@beagleboard arm-none-linux-gnueabi]# ./memsize 64MB OK 64 [root@beagleboard arm-none-linux-gnueabi]# ./mhz 491 MHz, 2.0367 nanosec clock [root@beagleboard arm-none-linux-gnueabi]# ./msleep Segmentation fault [root@beagleboard arm-none-linux-gnueabi]# ./msleep --help [root@beagleboard arm-none-linux-gnueabi]# ./msleep 100 [root@beagleboard arm-none-linux-gnueabi]# ./msleep 1000 [root@beagleboard arm-none-linux-gnueabi]# ./msleep 10000 [root@beagleboard arm-none-linux-gnueabi]# ./msleep 5000 [root@beagleboard arm-none-linux-gnueabi]# ./msleep 3000 [root@beagleboard arm-none-linux-gnueabi]# ./par_mem -M 1M measures the available parallelism in the memory hierarchy, up to len bytes 0.004096 3.06 0.008192 3.00 0.016384 3.44 0.032768 1.25 0.065536 1.08 0.131072 1.05 0.262144 1.14 0.524288 1.03 [root@beagleboard arm-none-linux-gnueabi]# ./par_ops integer bit parallelism: 1.51 integer add parallelism: 1.91 integer mul parallelism: 2.10 integer div parallelism: 1.25 integer mod parallelism: 1.03 int64 bit parallelism: 1.00 int64 add parallelism: 1.08 int64 mul parallelism: 1.00 int64 div parallelism: 1.00 int64 mod parallelism: 1.00 float add parallelism: 1.00 float mul parallelism: 1.04 float div parallelism: 1.00 double add parallelism: 1.00 double mul parallelism: 1.00 double div parallelism: 1.00 [root@beagleboard arm-none-linux-gnueabi]# ./stream -M 128K STREAM copy latency: 18.79 nanoseconds STREAM copy bandwidth: 851.53 MB/sec STREAM scale latency: 121.15 nanoseconds STREAM scale bandwidth: 132.06 MB/sec STREAM add latency: 126.25 nanoseconds STREAM add bandwidth: 190.11 MB/sec STREAM triad latency: 271.05 nanoseconds STREAM triad bandwidth: 88.55 MB/sec [root@beagleboard arm-none-linux-gnueabi]# ./tlb -M 1M tlb: 31 pages disk ----------- ./disk ../new.ppt 1.0 1.01 1.0 1.01 1.0 1.01 0.9 1.01 0.9 1.01 0.9 1.01 0.9 1.01 0.9 1.01 0.9 1.01 0.8 1.01 0.8 1.01 0.8 1.01 0.8 1.22 0.7 1.01 0.7 1.01 0.7 1.01 0.7 1.01 0.7 1.01 0.6 1.01 0.6 1.01 0.6 1.01 0.6 1.01 0.6 1.01 0.6 1.01 0.6 1.01 0.5 1.01 0.5 1.01 0.5 1.01 0.5 1.01 0.5 1.01 0.4 1.01 0.4 1.01 0.4 1.01 0.4 1.16 0.4 1.01 0.4 1.01 0.4 1.01 0.4 1.01 0.3 1.01 0.3 1.01 0.3 1.01 0.3 1.01 0.3 1.01 0.3 1.01 0.2 1.01 0.2 1.01 0.2 1.22 0.2 1.01 0.2 1.01 0.1 1.01 0.1 1.01 0.1 1.01 0.1 1.01 0.1 1.01 0.0 1.01 0.0 1.01 0.0 1.13 "Zone bandwidth for ../new.ppt 0.3 30.14 0.8 25.57 1.3 22.72 1.8 21.42 2.4 24.68 2.9 25.41 3.4 23.15 3.9 21.42 4.5 24.97 5.0 22.72 5.5 33.95 6.0 25.57 6.6 23.60 7.1 22.20 7.6 34.50 8.1 25.41 enough ------------- [root@beagleboard arm-none-linux-gnueabi]# ./enough 10000
Download lmbench script File:Lmbench script.zip