环境
cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
sudo docker version
Client: Version: 17.10.0-ce API version: 1.33 Go version: go1.8.3 Git commit: f4ffd25 Built: Tue Oct 17 19:04:05 2017 OS/Arch: linux/amd64 Server: Version: 17.10.0-ce API version: 1.33 (minimum version 1.12) Go version: go1.8.3 Git commit: f4ffd25 Built: Tue Oct 17 19:05:38 2017 OS/Arch: linux/amd64 Experimental: false
- kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1", GitCommit:"f38e43b221d08850172a9a4ea785a86a3ffa3b3a", GitTreeState:"clean", BuildDate:"2017-10-11T23:16:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
问题
宿主机cpu占用率非常高,cpu处于资源几近耗尽状态.使用top命令,显示cpu的使用情况是超过80%的cpu占用在wa(IO等待占用CPU的百分比)上,用户空间(us)和内核空间(sy)占用不到20%.id显示的剩余量几近为0.load average也显示较高,如下:
op - 17:29:08 up 10 days, 19:20, 1 user, load average: 14.31, 9.34, 9.08
Tasks: 351 total, 1 running, 350 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.9 us, 7.7 sy, 0.3 ni, 2.9 id, 81.7 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 8010196 total, 735612 free, 5450964 used, 1823620 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1636516 avail Mem
排查问题
这个问题在前一段时间也出现过,但最终两次产生的原因不同,上次出现此问题的原因是因为pv挂载的阿里云nfs,刚好宿主机上被调度了es(elasticsearch),然后es交换nfs数据量太大,导致了连接nfs的网络io太大.而这一次的问题是由于磁盘IO引起的.
使用iotop命令查看机器io情况
Total DISK READ : 9.79 M/s | Total DISK WRITE : 24.66 K/s Actual DISK READ: 8.21 M/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 5762 be/7 root 1743.49 K/s 0.00 B/s 0.00 % 95.26 % du -s /var/lib/docker/overlay/e~61f203cd85d130b629268078be9c07f4 5721 be/7 root 750.23 K/s 0.00 B/s 0.00 % 94.64 % du -s /var/lib/docker/overlay/c~dd81ce0684eed62e983301a2bd67e694 41 be/4 root 0.00 B/s 0.00 B/s 0.00 % 52.87 % [kswapd0] 5758 be/7 root 200.77 K/s 0.00 B/s 0.00 %