导读
如何观察磁盘的IO利用率以及饱和度?
看文本文给你解药!
翻译团队:知数堂藏经阁项目 - 菜鸟盟
团队成员:菜鸟盟–hades、菜鸟盟-bruce、菜鸟盟-冰焰
译文校稿:叶师傅
原文出处:https://www.percona.com/blog/2017/08/28/looking-disk-utilization-and-saturation/
原文作者:Peter Zaitsev (Percona CEO)
在这篇文章里,会介绍磁盘利用率和饱和度相关的知识。
In this blog post, I will look at disk utilization and saturation.
在之前的博客里面,我写了一些关于CPU使用率和饱和度之间有什么实质性不同,以及CPU使用率、饱和度如何从不同维度影响响应时间(RT)的文章。
现在我们来看另一个影响数据库性能重要因素:存储子系统。在下面文章里,我会用“磁盘”代替存储子系统。
In my previous blog post, I wrote about CPU utilization and saturation, the practical difference between them and how different CPU utilization and saturation impact response times.
Now we will look at another critical component of database performance: the storage subsystem. In this post, I will refer to the storage subsystem as “disk” (as a casual catch-all).
监控IO性能最常用的工具是iostat,会显示如下的信息:
The most common tool for command line IO performance monitoring is iostat, which shows information like this:
root@ts140i:~# iostat -x nvme0n1 5
Linux 4.4.0-89-generic (ts140i) 08/05/2017 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.51 0.00 2.00 9.45 0.00 88.04
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 0.00 3555.57 5887.81 52804.15 87440.73 29.70 0.53 0.06 0.13 0.01 0.05 50.71
avg-cpu: %user %nice %system %iowait %steal %idle
0.60 0.00 1.06 20.77 0.00 77.57
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 0.00 7612.80 0.00 113507.20 0.00 29.82 0.97 0.13 0.13 0.00 0.12 93.68
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.26 6.08 0.00 92.16
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 0.00 7653.20 0.00 113497.60 0.00 29.66 0.99 0.13 0.13 0.00 0.12 93.52
第一行(avg-cpu)显示的是自系统启动之后平均的性能。某些情况下,用当前系统的压力和平均性能作对比是很有用的。这篇文章的案例是测试环境,所以可以忽略对比这两种情况。
第二行(Device)显示的当前5秒钟的性能指标(在命令行中指定了每5秒输出一次)。
The first line shows the average performance since system start. In some cases, it is useful to compare the current load to the long term average. In this case, as it is a test system, it can be safely ignored. The next line shows the current performance metrics over five seconds intervals (as specified in the command line).
iostat命令用%util列显示利用率的信息,可以通过观察平均请求队列大小(the avgqu-sz 列)或者通过r_await和w_await列(显示平均的读和写的等待)来观察IO饱和度。如果超过正常值,设备就会过度饱和了。
The iostat command reports utilization information in the %util column, and you can look at saturation by either looking at the average request queue size (the avgqu-sz column) or looking at the r_await and w_await columns (which show the average wait for read and write operations). If it goes well above “normal” then the device is over-saturated.
和之前的文章一样,我们会执行Sysbench,然后观察iostat命令、Percona PMM的输出。
As in my previous blog post, we’ll perform some system Sysbench runs and observe how the iostat command line tool and Percona Monitoring and Management graphs behave.
我们用Sysbench测试文件IO,以便观察磁盘的变化。我创建了一个100GB的文件,因为用了DirectIO方式所以所有的请求都会直接打到磁盘。我也会用"sync”刷新模式以便更好的控制IO请求的并发度。
To focus specifically on the disk, we’re using the Sysbench fileio test. I’m using just one 100GB