Troubleshooting High I/O Wait in Linux --- A walkthrough on how to find processes that are causing h

文章出处:http://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/


Linux has many tools available for troubleshooting some are easy to use, some are more advanced.

I/O Wait is an issue that requires use of some of the more advanced tools as well as an advanced usage of some of the basic tools. The reason I/O Wait is difficult to troubleshoot is due to the fact that by default there are plenty of tools to tell you that your system is I/O bound, but not as many that can narrow the problem to a specific process or processes.

Answering whether or not I/O is causing system slowness

To identify whether I/O is causing system slowness you can use several commands but the easiest is the unix command top.

 # top
 top - 14:31:20 up 35 min, 4 users, load average: 2.25, 1.74, 1.68
 Tasks: 71 total, 1 running, 70 sleeping, 0 stopped, 0 zombie
 Cpu(s): 2.3%us, 1.7%sy, 0.0%ni, 0.0%id, 96.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Mem: 245440k total, 241004k used, 4436k free, 496k buffers
 Swap: 409596k total, 5436k used, 404160k free, 182812k cached

From the CPU(s) line you can see the current percentage of CPU in I/O Wait; The higher the number the more cpu resources are waiting for I/O access.

wa -- iowait
 Amount of time the CPU has been waiting for I/O to complete.

Finding which disk is being written to

The above top command shows I/O Wait from the system as a whole but it does not tell you what disk is being affected; for this we will use the iostat command.

 $ iostat -x 2 5
 avg-cpu: %user %nice %system %iowait %steal %idle
  3.66 0.00 47.64 48.69 0.00 0.00

 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
 sda 44.50 39.27 117.28 29.32 11220.94 13126.70 332.17 65.77 462.79 9.80 2274.71 7.60 111.41
 dm-0 0.00 0.00 83.25 9.95 10515.18 4295.29 317.84 57.01 648.54 16.73 5935.79 11.48 107.02
 dm-1 0.00 0.00 57.07 40.84 228.27 163.35 8.00 93.84 979.61 13.94 2329.08 10.93 107.02

The iostat command in the example will print a report every 2 seconds for 5 intervals; the -x tells iostat to print out an extended report.

The 1st report from iostat will print statistics based on the last time the system was booted; for this reason in most circumstances the first report from iostat should be ignored. Every sub-sequential report printed will be based on the time since the previous interval. For example in our command we will print a report 5 times, the 2nd report are disk statistics gathered since the 1st run of the report, the 3rd is based from the 2nd and so on.

In the above example the %utilized for sda is 111.41% this is a good indicator that our problem lies with processes writing to sda. While the test system in my example only has 1 disk this type of information is extremely helpful when the server has multiple disks as this can narrow down the search for which process is utilizing I/O.

Aside from %utilized there is a wealth of information in the output of iostat; items such as read and write requests per millisecond(rrqm/s & wrqm/s), reads and writes per second (r/s & w/s) and plenty more. In our example our program seems to be read and write heavy this information will be helpful when trying to identify the offending process.

Finding the processes that are causing high I/O

iotop
 # iotop
 Total DISK READ: 8.00 M/s | Total DISK WRITE: 20.36 M/s
  TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
 15758 be/4 root 7.99 M/s 8.01 M/s 0.00 % 61.97 % bonnie++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp

The simplest method of finding which process is utilizing storage the most is to use the command iotop. After looking at the statistics it is easy to identify bonnie++ as the process causing the most I/O utilization on this machine.

While iotop is a great command and easy to use, it is not installed on all (or the main) Linux distributions by default; and I personally prefer not to rely on commands that are not installed by default. A systems administrator may find themselves on a system where they simply cannot install the non-defualt packages until a scheduled time which may be far too late depending on the issue.

If iotop is not available the below steps will also allow you to narrow down the offending process/processes.

Process list "state"

The ps command has statistics for memory and cpu but it does not have a statistic for disk I/O. While it may not have a statistic for I/O it does show the processes state which can be used to indicate whether or not a process is waiting for I/O.

The ps state field provides the processes current state; below is a list of states from the man page.

PROCESS STATE CODES
 D uninterruptible sleep (<span style="color:#FF0000;">usually IO</span>)
 R running or runnable (on run queue)
 S interruptible sleep (waiting for an event to complete)
 T stopped, either by a job control signal or because it is being traced.
 W paging (not valid since the 2.6.xx kernel)
 X dead (should never be seen)
 Z defunct ("zombie") process, terminated but not reaped by its parent.

Processes that are waiting for I/O are commonly in an "uninterruptible sleep" state or "D"; given this information we can simply find the processes that are constantly in a wait state.

Example:

 # for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done
 D 248 [jbd2/dm-0-8]
 D 16528 bonnie++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
 ----
 D 22 [kswapd0]
 D 16528 bonnie++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
 ----
 D 22 [kswapd0]
 D 16528 bonnie++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
 ----
 D 22 [kswapd0]
 D 16528 bonnie++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
 ----
 D 16528 bonnie++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
 ----

The above for loop will print the processes in a "D" state every 5 seconds for 10 intervals.

From the output above the bonnie++ process with a pid of 16528 is waiting for I/O more often than any other process. At this point the bonnie++ seems likely to be causing the I/O Wait, but just because the process is in an uninterruptible sleep state does not necessarily prove that it is the cause of I/O wait.

To help confirm our suspicions we can use the /proc file system. Within each processes directory there is a file called "io" which holds the same I/O statistics that iotop is utilizing.

 # cat /proc/16528/io
 rchar: 48752567
 wchar: 549961789
 syscr: 5967
 syscw: 67138
 read_bytes: 49020928
 write_bytes: 549961728
 cancelled_write_bytes: 0

The read_bytes and write_bytes are the number of bytes that this specific process has written and read from the storage layer. In this case the bonnie++ process has read 46 MB and written 524 MB to disk. While for some processes this may not be a lot, in our example this is enough write and reads to cause the high i/o wait that this system is seeing.

Finding what files are being written too heavily

The lsof command will show you all of the files open by a specific process or all processes depending on the options provided. From this list one can make an educated guess as to what files are likely being written to often based on the size of the file and the amounts present in the "io" file within /proc.

To narrow down the output we will use the -p <pid> options to print only files open by the specific process id.

 # lsof -p 16528
 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
 bonnie++ 16528 root cwd DIR 252,0 4096 130597 /tmp
 <truncated>
 bonnie++ 16528 root 8u REG 252,0 501219328 131869 /tmp/Bonnie.16528
 bonnie++ 16528 root 9u REG 252,0 501219328 131869 /tmp/Bonnie.16528
 bonnie++ 16528 root 10u REG 252,0 501219328 131869 /tmp/Bonnie.16528
 bonnie++ 16528 root 11u REG 252,0 501219328 131869 /tmp/Bonnie.16528
 bonnie++ 16528 root 12u REG 252,0 501219328 131869 <strong>/tmp/Bonnie.16528</strong>

To even further confirm that these files are being written to the heavily we can see if the /tmp filesystem is part of sda.
 # df /tmp
 Filesystem 1K-blocks Used Available Use% Mounted on
 /dev/mapper/workstation-root 7667140 2628608 4653920 37% /

From the output of df we can determine that /tmp is part of the root logical volume in the workstation volume group.

# pvdisplay
  --- Physical volume ---
  PV Name /dev/sda5
  VG Name workstation
  PV Size 7.76 GiB / not usable 2.00 MiB
  Allocatable yes
  PE Size 4.00 MiB
  Total PE 1986
  Free PE 8
  Allocated PE 1978
  PV UUID CLbABb-GcLB-l5z3-TCj3-IOK3-SQ2p-RDPW5S


Using pvdisplay we can see that the /dev/sda5 partition part of the sda disk is the partition that the workstation volume group is using and in turn is where /tmp exists. Given this information it is safe to say that the large files listed in the lsof above are likely the files being read & written to frequently.



  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
本系统的研发具有重大的意义,在安全性方面,用户使用浏览器访问网站时,采用注册和密码等相关的保护措施,提高系统的可靠性,维护用户的个人信息和财产的安全。在方便性方面,促进了校园失物招领网站的信息化建设,极大的方便了相关的工作人员对校园失物招领网站信息进行管理。 本系统主要通过使用Java语言编码设计系统功能,MySQL数据库管理数据,AJAX技术设计简洁的、友好的网址页面,然后在IDEA开发平台中,编写相关的Java代码文件,接着通过连接语言完成与数据库的搭建工作,再通过平台提供的Tomcat插件完成信息的交互,最后在浏览器中打开系统网址便可使用本系统。本系统的使用角色可以被分为用户和管理员,用户具有注册、查看信息、留言信息等功能,管理员具有修改用户信息,发布寻物启事等功能。 管理员可以选择任一浏览器打开网址,输入信息无误后,以管理员的身份行使相关的管理权限。管理员可以通过选择失物招领管理,管理相关的失物招领信息记录,比如进行查看失物招领信息标题,修改失物招领信息来源等操作。管理员可以通过选择公告管理,管理相关的公告信息记录,比如进行查看公告详情,删除错误的公告信息,发布公告等操作。管理员可以通过选择公告类型管理,管理相关的公告类型信息,比如查看所有公告类型,删除无用公告类型,修改公告类型,添加公告类型等操作。寻物启事管理页面,此页面提供给管理员的功能有:新增寻物启事,修改寻物启事,删除寻物启事。物品类型管理页面,此页面提供给管理员的功能有:新增物品类型,修改物品类型,删除物品类型。
### 回答1: 无法验证您的验证码响应。请访问https://docs.github.com/articles/troubleshooting-connectivity-problems/#troubleshooting-the-captcha获取故障排除信息。 ### 回答2: “无法确认您的验证码回应,请访问 https://docs.github.com/articles/troubleshooting-connectivity-problems/” 是GitHub上使用某些功能时可能会遇到的错误提示信息。它意味着您提交的验证码未通过验证,这可能是因为您的网络环境或设备出现了问题。 如果您遇到这个问题,可以从以下几个方面解决: 1. 确认您正在使用支持的浏览器和版本,例如Google Chrome、Firefox、Microsoft Edge等。如果您使用的是旧版浏览器或移动端浏览器,验证码可能无法在其中正常工作。 2. 检查您的网络连接是否正常。如果您的网络连接不稳定或受阻,则可能无法验证验证码。请尝试重启您的路由器或调整您的网络设置。 3. 清除浏览器缓存和cookie。与验证码相关的信息可能被存储在浏览器缓存或cookie中,清除它们后再次尝试可能有助于解决问题。 4. 如果以上方法无法解决问题,则可以访问 https://docs.github.com/articles/troubleshooting-connectivity-problems/ 获取更多帮助。该文档包含了关于GitHub连接问题的常见故障排除方法,如DNS设置、网络代理设置等,通过遵循步骤进行排除可能会解决问题。 总之,如果您遇到无法验证验证码回应的问题,请先检查您的网络环境和设备,然后按照上述方法进行排除。如果问题仍然存在,可以访问GitHub的帮助文档或联系技术支持获取进一步的帮助。 ### 回答3: 当你在使用GitHub时,可能会遇到一些连接问题。其中一种常见的问题是“unable to verify your captcha response”(无法验证您的验证码响应)。当您收到此错误消息时,这意味着您要么没有正确输入验证码,要么您的网络连接存在问题。这篇文章介绍了一些排查和解决这个问题的方法。 首先,您需要确保已正确输入验证码。验证码通常是一组数字或字母,用于防止自动化程序攻击。确保您输入的是正确的验证码,而不是其它字符或文字。如果您没有完全确定验证码是什么,可以点击刷新按钮以获取新的验证码。 然后,您需要检查您的网络连接。这可能是因为您的网络连接速度太慢或您的网络配置存在问题。要解决这个问题,您可以尝试重启路由器或尝试连接一个更稳定的网络。 如果这些方法都没有解决问题,您可以尝试通过重启计算机或清除浏览器缓存来解决该问题。有时候,缓存可能会导致问题出现,因此清除缓存可能会有所帮助。 最后,如果您尝试了以上所有方法仍无法解决问题,您可以访问https://docs.github.com/articles/troubleshooting-connectivity-problems/获取更多相关信息和帮助。在这个网站上,您可以找到详细的解决方案和一些常见问题的解决方式,以确保您可以顺利地使用GitHub。 总的来说,无法验证您的验证码响应可能是一个非常烦人的问题,但是您可以尝试以上方法来解决该问题。如果一切失败,GitHub提供了很多支持和解决方案,您可以通过他们的网站找到更多相关信息和帮助。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值