Linux System and Performance Monitoring(总结篇)

Linux System and Performance Monitoring(总结篇) 
Date: 2009.07.21 
Author: Darren Hoch 
译: Tonnyom[AT]hotmail.com
结束语: 这是该译文的最后一篇,在这篇中,作者提供了一个案例环境,用之前几篇所阐述的理论以及涉及到的工具,对其进行一个整体的系统性能检查.对大家更好理解系统性能监控,进行一次实战演习.
 
BTW:在中文技术网站上,类似内容的文章,大体是来自该作者06-07年所著论文,此译文是建立在作者为OSCON 2009重写基础上的.所以部分内容可能会存在重复雷同,特此说明下.
附录 A: 案例学习 - 性能监控之循序渐进
某一天,一个客户打电话来需要技术帮助,并抱怨平常15秒就可以打开的网页现在需要20分钟才可以打开.
具体系统配置如下:
RedHat Enterprise Linux 3 update 7 
Dell 1850 Dual Core Xenon Processors, 2 GB RAM, 75GB 15K Drives 
Custom LAMP software stack(译注:Llinux+apache+mysql+php 环境)
性能分析之步骤
1. 首先使用vmstat 查看大致的系统性能情况:
# vmstat 1 10 
procs memory swap io system cpu 
r b swpd free buff cache si so bi bo in cs us sy id wa 
1 0 249844 19144 18532 1221212 0 0 7 3 22 17 25 8 17 18 
0 1 249844 17828 18528 1222696 0 0 40448 8 1384 1138 13 7 65 14 
0 1 249844 18004 18528 1222756 0 0 13568 4 623 534 3 4 56 37 
2 0 249844 17840 18528 1223200 0 0 35200 0 1285 1017 17 7 56 20 
1 0 249844 22488 18528 1218608 0 0 38656 0 1294 1034 17 7 58 18 
0 1 249844 21228 18544 1219908 0 0 13696 484 609 559 5 3 54 38 
0 1 249844 17752 18544 1223376 0 0 36224 4 1469 1035 10 6 67 17 
1 1 249844 17856 18544 1208520 0 0 28724 0 950 941 33 12 49 7 
1 0 249844 17748 18544 1222468 0 0 40968 8 1266 1164 17 9 59 16 
1 0 249844 17912 18544 1222572 0 0 41344 12 1237 1080 13 8 65 13
分析: 
1,不会是内存不足导致,因为swapping 始终没变化(si 和 so).尽管空闲内存不多(free),但swpd 也没有变化. 
2,CPU 方面也没有太大问题,尽管有一些运行队列(procs r),但处理器还始终有50% 多的idle(CPU id). 
3,有太多的上下文切换(cs)以及disk block从RAM中被读入(bo). 
4,CPU 还有平均20% 的I/O 等待情况.
结论: 
从以上总结出,这是一个I/O 瓶颈.
2. 然后使用iostat 检查是谁在发出IO 请求:
# iostat -x 1 
Linux 2.4.21-40.ELsmp (mail.example.com) 03/26/2007
avg-cpu: %user %nice %sys %idle 
30.00 0.00 9.33 60.67
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
/dev/sda 7929.01 30.34 1180.91 14.23 7929.01 357.84 3964.50 178.92 6.93 0.39 0.03 0.06 6.69 
/dev/sda1 2.67 5.46 0.40 1.76 24.62 57.77 12.31 28.88 38.11 0.06 2.78 1.77 0.38 
/dev/sda2 0.00 0.30 0.07 0.02 0.57 2.57 0.29 1.28 32.86 0.00 3.81 2.64 0.03 
/dev/sda3 7929.01 24.58 1180.44 12.45 7929.01 297.50 3964.50 148.75 6.90 0.32 0.03 0.06 6.68
avg-cpu: %user %nice %sys %idle 
9.50 0.00 10.68 79.82
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
/dev/sda 0.00 0.00 1195.24 0.00 0.00 0.00 0.00 0.00 0.00 43.69 3.60 0.99 117.86 
/dev/sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda3 0.00 0.00 1195.24 0.00 0.00 0.00 0.00 0.00 0.00 43.69 3.60 0.99 117.86
avg-cpu: %user %nice %sys %idle 
9.23 0.00 10.55 79.22
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
/dev/sda 0.00 0.00 1200.37 0.00 0.00 0.00 0.00 0.00 0.00 41.65 2.12 0.99 112.51 
/dev/sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
/dev/sda3 0.00 0.00 1200.37 0.00 0.00 0.00 0.00 0.00 0.00 41.65 2.12 0.99 112.51
分析: 
1,看上去只有/dev/sda3 分区很活跃,其他分区都很空闲. 
2,差不多有1200 读IOPS,磁盘本身是支持200 IOPS左右(译注:参考之前的IOPS 计算公式). 
3,有超过2秒,实际上没有一个读磁盘(rkb/s).这和在vmstat 看到有大量I/O wait是有关系的. 
4,大量的read IOPS(r/s)和在vmstat 中大量的上下文是匹配的.这说明很多读操作都是失败的.
结论: 
从以上总结出,部分应用程序带来的读请求,已经超出了I/O 子系统可处理的范围.
3. 使用top 来查找系统最活跃的应用程序
# top -d 1 
11:46:11 up 3 days, 19:13, 1 user, load average: 1.72, 1.87, 1.80 
176 processes: 174 sleeping, 2 running, 0 zombie, 0 stopped 
CPU states: cpu user nice system irq softirq iowait idle 
total 12.8% 0.0% 4.6% 0.2% 0.2% 18.7% 63.2% 
cpu00 23.3% 0.0% 7.7% 0.0% 0.0% 36.8% 32.0% 
cpu01 28.4% 0.0% 10.7% 0.0% 0.0% 38.2% 22.5% 
cpu02 0.0% 0.0% 0.0% 0.9% 0.9% 0.0% 98.0% 
cpu03 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% 
Mem: 2055244k av, 2032692k used, 22552k free, 0k shrd, 18256k buff 
1216212k actv, 513216k in_d, 25520k in_c 
Swap: 4192956k av, 249844k used, 3943112k free 1218304k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
14939 mysql 25 0 379M 224M 1117 R 38.2 25.7% 15:17.78 mysqld 
4023 root 15 0 2120 972 784 R 2.0 0.3 0:00.06 top 
1 root 15 0 2008 688 592 S 0.0 0.2 0:01.30 init 
2 root 34 19 0 0 0 S 0.0 0.0 0:22.59 ksoftirqd/0 
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 
4 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 events/0
分析: 
1,占用资源最多的好像就是mysql 进程,其他都处于完全idle 状态. 
2,在top(wa) 看到的数值,和在vmstat 看到的wio 数值是有关联的.
结论: 
从以上总结出,似乎就只有mysql 进程在请求资源,因此可以推论它就是导致问题的关键.
4. 现在已经确定是mysql 在发出读请求,使用strace 来检查它在读请求什么.
# strace -p 14939
Process 14939 attached - interrupt to quit 
read(29, "\3\1\237\1\366\337\1\222%\4\2\0\0\0\0\0012P/d", 20) = 20 
read(29, "ata1/strongmail/log/strongmail-d"..., 399) = 399 
_llseek(29, 2877621036, [2877621036], SEEK_SET) = 0 
read(29, "\1\1\241\366\337\1\223%\4\2\0\0\0\0\0012P/da", 20) = 20 
read(29, "ta1/strongmail/log/strongmail-de"..., 400) = 400 
_llseek(29, 2877621456, [2877621456], SEEK_SET) = 0 
read(29, "\1\1\235\366\337\1\224%\4\2\0\0\0\0\0012P/da", 20) = 20 
read(29, "ta1/strongmail/log/strongmail-de"..., 396) = 396 
_llseek(29, 2877621872, [2877621872], SEEK_SET) = 0 
read(29, "\1\1\245\366\337\1\225%\4\2\0\0\0\0\0012P/da", 20) = 20 
read(29, "ta1/strongmail/log/strongmail-de"..., 404) = 404 
_llseek(29, 2877622296, [2877622296], SEEK_SET) = 0 
read(29, "\3\1\236\2\366\337\1\226%\4\2\0\0\0\0\0012P/d", 20) = 20
分析: 
1,大量的读操作都在不断寻道中,说明mysql 进程产生的是随机IO. 
2,看上去似乎是,某一sql 查询导致读操作.
结论: 
从以上总结出,所有的读IOPS 都是mysql 进程在执行某些读查询时产生的.
5. 使用mysqladmin 命令,来查找是哪个慢查询导致的.
# ./mysqladmin -pstrongmail processlist
+----+------+-----------+------------+---------+------+----------+---------------------------------------- 
| Id | User | Host | db | Command | Time | State | Info 
+----+------+-----------+------------+---------+------+----------+---------------------------------------- 
| 1 | root | localhost | strongmail | Sleep | 10 | | 
| 2 | root | localhost | strongmail | Sleep | 8 | | 
| 3 | root | localhost | root | Query | 94 | Updating | update `failures` set 
`update_datasource`='Y' where database_id='32' and update_datasource='N' and | 
| 14 | root | localhost | | Query | 0 | | show processlist
分析: 
1,MySQL 数据库里,似乎在不断的运行table update查询. 
2,基于这个update 查询,数据库是对所有的table 进行索引.
结论: 
从以上总结出,MySQL里这些update 查询问题,都是在尝试对所有table 进行索引.这些产生的读请求正是导致系统性能下降的原因.
后续
把以上这些性能信息移交给了相关开发人员,用于分析他们的PHP 代码.一个开发人员对代码进行了临时性优化.某个查询如果出错了,也最多到100K记录.数据库本身考虑最多存在4百万记录.最后,这个查询不会再给数据库带来负担了.
References 
• Ezlot, Phillip – Optimizing Linux Performance, Prentice Hall, Princeton NJ 2005 ISBN – 0131486829 
• Johnson, Sandra K., Huizenga, Gerrit – Performance Tuning for Linux Servers, IBM Press, Upper Saddle River NJ 2005 ISBN 013144753X 
• Bovet, Daniel Cesati, Marco – Understanding the Linux Kernel, O’Reilly Media, Sebastoppl CA 2006, ISBN 0596005652 
• Blum, Richard – Network Performance Open Source Toolkit, Wiley, Indianapolis IN 2003, ISBN 0-471-43301-2 
• Understanding Virtual Memory in RedHat 4, Neil Horman, 12/05  http://people.redhat.com/nhorman/papers/rhel4_vm.pdf 
• IBM, Inside the Linux Scheduler,  http://www.ibm.com/developerworks/linux/library/l-scheduler/ 
• Aas, Josh, Understanding the Linux 2.6.8.1 CPU Scheduler,  http://josh.trancesoftware.com/linux/linux_cpu_scheduler.pdf 
• Wieers, Dag, Dstat: Versatile Resource Statistics Tool,  http://dag.wieers.com/home-made/dstat/
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Proven techniques for optimizing web, file, print, database, and application servers Practical introductions to open source performance management tools Easy-to-understand examples based on Red Hat Enterprise Linux and Novell SUSE Linux Enterprise ServerAn indispensable guide to maximizing Linux system and application performanceFrom Wall Street to Hollywood, Linux runs many of the world's most businesscritical systems. Linux performance now impacts the entire enterprise. In Performance Tuning for Linux Servers, a team of IBM's most-experienced Linux performance specialists shows you how to find bottlenecks, measure performance, and identify effective optimizations.This book doesn't just cover kernel tuning: it shows how to maximize the end-to-end performance of real-world applications and databases running on Linux. Throughout, the authors present realistic examples based on today's most popular enterprise Linux platforms, Intel-based Red Hat Enterprise Linux, and Novell SUSE Linux Enterprise Server. These examples are designed for simplicity, clarity, and easy adaptation to any contemporary Linux environment. You'll discover how to Install and configure Linux for maximum performance from the outset Evaluate and choose the right hardware architecture for your Linux environment Understand Linux kernels 2.4 through 2.6: components, performance issues, and optimization opportunities Master core Linux performance tuning principles and strategies Utilize free, open source tools for measurement, monitoring, system tracing, and benchmarking Interpret performance data to analyze your Linux server's real-world behavior Optimize Linux system schedulers, memory, I/O, file systems, and networking Tune web, file, database, and application servers running commercial workloads Predict the impact of changes in tuning parameters or configurations Tune Linux code: optimize design, timing, sockets, threads, synchronization, and more Architect for maximum performance: SMP scaling, clu

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值