转载:http://hi.baidu.com/springwu/blog/item/267ec345cd9d4628879473ce.html
1、CPU
2、内存
3、I/O
4、网络
1、CPU
应该理解CPU运行的主要参数:上下文切换,运行队列、CPU利用率与平均负载
(1)上下文切换:
1)CPU指令从一个进程(线程)到另一个进程,称为上下文切换。
2)当一个进程发生切换时,在内存中存储CPU当前状态。
3)Kernel也能获取在内存中存储在上一个进程的状态并载入CPU
4)上下文切换对多任务处理的CPU来说是非常重要的
5)然而,很高的切换数量会引起性能问题
(2)运行队列(run Queue)
1)运行队列是指在CPU队列中当前活动的进程队列的总数
2)当CPU准备执行一个进程,就从运行队列中基于进程的优先级取出一个
3)要留意那些睡眠的进程或不在运行队列的I/O等待状态
4)注意:很高的进程队列数会引起性能问题
(3)CPU利用率
1)这是之当前有多少CPU被占用
2)这是相当简单,可以直接用TOP命令查看CPU利用率
3)CPU利用率为100%意味这超过系统负载
4)因此较高的CPU利用率会引起性能问题
(4)平均负载
1)这是指在特定的一段时间的CPU负载
2)在linux,平均负载显示最近1分钟、5分钟或15分钟。这对查看整个系统负载是否上升或下降是非常有用的,
3)例如:一个平均负载“0.75 1.70 2.10”表示负载在将来下降。0.75是在最近一分钟的平均负载,1.70是最近5分钟的平均负载,2.10是最近15分钟平均负载。
4)注意,平均负载是通过在队列中的总数进程和不间断任务状态的进程总数来计算的。
2、网络
(1)很好的理解TCP/IP概念非常有利于理解任何网络情况的分析,在将来的文档中做更多讨论。
(2)对于网络接口,应该监控总得包的数量(包括发送包和接受包、丢弃包等)。
3、IO
(1)I/O wait等待是CPU等待I/O操作。如果在系统中一直看见I/O等待很高,说明磁盘子系统(disk subsystem)存在问题。
(2)还应该监控每秒读写能力。这是衡量数据块的读写,这些涉及到 bi(block in )和bo(block out)
(3)TPS(Transactions per second) 表示rtps(read transactions per second)和wtps(write transactions per second)的总数每秒处理数,
Virtual4、内存
(1)如你所知,RAM是物理内存。如果在系统上有4GB RAM,就可以有4GB物理内存。(If you have 4GB RAM installed on your system, you have 4GB of physical memory. 感觉引意是:如果在机器上有1*4=GB的内存条,则在机器上显示出4GB<如果是32位的就算有8B也只能显示4G>。)
(2)虚拟内存=在磁盘可用交换分区+物理内存大小。虚拟内存包含了用户空间和内核空间
(3)用32位或64位的系统在一个线程能利用多大内存上有很大的不同
(4)不能被使用的内存将会通过内核作为文件系统的Cache
(5)当需要更多的内存时Linux系统会用swap.例如:需要比物理内存多的内存。他会将内存中最小的页面从物理内存交换到磁盘上
(6)太多的交换会引起性能问题,因为磁盘比物理内存慢很多,并且从RAM切换到disk还要花费时间
(根据自己的理解翻译了部分内容,下面的懒得翻译了)
All of the above 4 subsystems are interrelated. Just because you see a high reads/second, or writes/second, or I/O wait doesn’t mean the issue is there with the I/O sub-system. It also depends on what the application is doing. In most cases, the performance issue might be caused by the application that is running on the Linux system.
Remember the 80/20 rule — 80% of the performance improvement comes from tuning the application, and the rest 20% comes from tuning the infrastructure components.
这里有些linux监控工具 top, free, ps, iostat, vmstat, mpstat, sar, tcpump, netstat, iozone
We’ll be discussing more about these tools and how to use them in the upcoming articles in this series.
解决问题方法:
Step 1 – Understand (and reproduce) the problem: Half of the problem is solved when you clearly understand what the problem is. Before trying to solve the performance issue, first work on clearly defining the problem. The more time you spend on understanding and defining the problem will give you enough details to look for the answers in the right place. If possible, try to reproduce the problem, or at least simulate a situation that you think closely resembles the problem. This will later help you to validate the solution you come up to fix the performance issue.
Step 2 – Monitor and collect data: After defining the problem clearly, monitor the system and try to collect as much data as possible on various subsystems. Based on this data, come up list of potential issues.
Step 3 – Eliminate and narrow down issues: After having a list of potential issues, dive into each one of them and eliminate any non issues. Narrow it down further to see whether it is an application issue, or an infrastructure issue. Drill down further and narrow it down to a specific component. For example, if it is an infrastructure issue, narrow it down and identify the subsystem that is causing the issue. If it is an I/O subsystem issue, narrow it down to a specific partition, or raid group, or LUN, or disk. Basically, keep drilling down until you put your finger on the root cause of the issue.
Step 4 – One change at a time: Once you’ve narrowed down to a small list of potential issues, don’t try to make multiple changes at one time. If you make multiple changes, you wouldn’t know which one fixed the original issue. Multiple changes at one time might also cause new issues, which you’ll be chasing after instead of fixing the original issue. So, make one change at a time, and see if it fixes the original problem.
In the upcoming articles of the performance series, we’ll discuss more about how to monitor and address performance issues on CPU, Memory, I/O and Network subsystem using various Linux performance monitoring tools.