本次评估是基于高通660 平台,Arm 64 位处理器,其中有 8 个核,4个小核 1.8G 频率,4个大核 2.2G 频率 ,主要对比 zram 的两个主流的压缩算法的性能差别。由于前段时间在看工作中发现这个平台默认只使用 lzo,但在社区上早就有了 lz4 的算法,且性能要比 lzo 算法在读方面要好 3 倍,于是就搜了一下 lz4 的压缩算法在 arm 平台上的实际表现,可惜资料很少,并且还有一些评估报告说 lz4 没有达到预期那么好,所以我确定自己着手实践一下,到底为什么这个平台默认只开 lzo 算法。
使能 LZ4
需要打开以下几个内核配置项
+CONFIG_ZRAM_LZ4_COMPRESS=y
+CONFIG_LZ4_DECOMPRESS=y
+CONFIG_LZ4_COMPRESS=y
重新编译后启动系统,可以通过往 sys/block/zram0/comp_algorithm 节点上写 lz4 来切换zram 的压缩算法
注意: 需要先 reset zram 设备后才能切换压缩算法,正确的切换方法为
echo 1 > /sys/class/block/zram0/reset
echo lz4 > sys/block/zram0/comp_algorithm
测试方法
由于 zram 被设计为 block 设备,所以可以通过块设备读写来评估 zram 的性能,在不挂载任何文件系统的情况下测试。
为了确保去运算能力的一致性,把 cpu 的频率都锁定到最高点测试。
由于sdm660 没有动态 cpu hotplug 的功能,因此 8 核全部在线测试。
分别测试单线程和多线程(8个)读写两种情况下的 lz4 和 lzo 性能。
以下是初始化命令。
swapoff /dev/block/zram0
echo 1 > /sys/class/block/zram0/reset
echo lzo/lz4 > /sys/class/block/zram0/comp_algorithm
echo 8 > /sys/class/block/zram0/max_comp_streams # 设置最大压缩流为 8 个
echo 1610612736 > /sys/class/block/zram0/disksize # 设置块设备大小为1.5G
echo 1843200 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1843200 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo 2208000 > /sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq
echo 2208000 > /sys/devices/system/cpu/cpufreq/policy4/scaling_min_freq
然后,通过 fio 工具进行 io 测试,总 io 量为 500M,单次 io 的大小为 32 M。
fio -filename=/dev/block/zram0 -thread -rw=write -bs=32468k -size=500M -group_reporting -numjobs=1 -name=mytest
-rw :指定读/写
-numjobs :指定线程数
LZO 测试结果
单线程
写:
第一次测试结果:
mytest: (g=0): rw=write, bs=32468K-32468K/32468K-32468K/32468K-32468K, ioengine=sync, iodepth=1
fio-2.2.6
Starting 1 thread
Jobs: 1 (f=1)
mytest: (groupid=0, jobs=1): err= 0: pid=5149: Thu Dec 28 16:32:03 2017
write: io=519488KB, bw=660926KB/s, iops=20, runt= 786msec
clat (msec): min=17, max=81, avg=46.84, stdev=28.01
lat (msec): min=20, max=84, avg=49.17, stdev=27.91
clat percentiles (usec):
| 1.00th=[17536], 5.00th=[17536], 10.00th=[17792], 20.00th=[17792],
| 30.00th=[18304], 40.00th=[18816], 50.00th=[43264], 60.00th=[71168],
| 70.00th=[74240], 80.00th=[74240], 90.00th=[81408], 95.00th=[81408],
| 99.00th=[81408], 99.50th=[81408], 99.90th=[81408], 99.95th=[81408],
| 99.99th=[81408]
bw (KB /s): min=777318, max=777318, per=100.00%, avg=777318.00, stdev= 0.00
lat (msec) : 20=43.75%, 50=12.50%, 100=43.75%
cpu : usr=6.35%, sys=60.99%, ctx=106, majf=0, minf=3
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=16/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=519488KB, aggrb=660926KB/s, minb=660926KB/s, maxb=660926KB/s, mint=786msec, maxt=786msec
Disk stats (read/write):
zram0: ios=0/53289, merge=0/0, ticks=0/360, in_queue=360, util=48.78%
第二次测试结果:
mytest: (g=0): rw=write, bs=32468K-32468K/32468K-32468K/32468K-32468K, ioengine=sync, iodepth=1
fio-2.2.6
Starting 1 thread
Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
mytest: (groupid=0, jobs=1): err= 0: pid=5212: Thu Dec 28 16:34:49 2017
write: io=519488KB, bw=619914KB/s, iops=19, runt= 838msec
clat (msec): min=17, max=89, avg=50.01, stdev=31.28
lat (msec): min=19, max=92, avg=52.40, stdev=31.06
clat percentiles (usec):
| 1.00th=[17024], 5.00th=[17024], 10.00th=[17280], 20.00th=[17536],
| 30.00th=[17536], 40.00th=[18048], 50.00th=[43776], 60.00th=[72192],
| 70.00th=[79360], 80.00th=[79360], 90.00th=[87552], 95.00th=[89600],
| 99.00th=[89600], 99.50th=[89600], 99.90th=[89600], 99.95th=[89600],
| 99.99th=[89600]
bw (KB /s): min=728987, max=728987, per=100.00%, avg=728987.00, stdev= 0.00
lat (msec) : 20=43.75%, 50=6.25%, 100=50.00%
cpu : usr=21.48%, sys=41.77%, ctx=71, majf=0, minf=3
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=16/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=519488KB, aggrb=619914KB/s, minb=619914KB/s, maxb=619914KB/s, mint=838msec, maxt=838msec
Disk stats (read/write):
zram0: ios=0/50866, merge=0/0, ticks=0/400, in_queue=400, util=53.98%
读:
第一次测试结果:
mytest: (g=0): rw=read, bs=32468K-32468K/32468K-32468K/32468K-32468K, ioengine=sync, iodepth=1
fio-2.2.6
Starting 1 thread
mytest: (groupid=0, jobs=1): err= 0: pid=5139: Thu Dec 28 16:49:24 2017
read : io=519488KB, bw=628160KB/s, iops=19, runt= 827msec
clat (msec): min=36, max=78, avg=51.71, stdev= 9.23
lat (msec): min=36, max=78, avg=51.71, stdev= 9.23
clat percentiles (usec):
| 1.00th=[36608], 5.00th=[36608], 10.00th=[39168], 20.00th=[47360],
| 30.00th=[49920], 40.00th=[51968], 50.00th=[52480], 60.00th=[52992],
| 70.00th=[54016], 80.00th=[55040], 90.00th=[56064], 95.00th=[78336],
| 99.00th=[78336], 99.50th=[78336], 99.90th=[78336], 99.95th=[78336],
| 99.99th=[78336]
bw (KB /s): min=617262, max=617262, per=98.27%, avg=617262.00, stdev= 0.00
lat (msec) : 50=31.25%, 100=68.75%
cpu : usr=0.00%, sys=97.94%, ctx=14, majf=0, minf=8121
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=16/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group