linux 系统平均负载查看

最新推荐文章于 2024-03-13 23:38:24 发布

weixin_34177064

最新推荐文章于 2024-03-13 23:38:24 发布

阅读量115

点赞数

文章标签：运维数据库操作系统

原文链接：http://blog.51cto.com/yamig/1110741

版权

为什么要监控系统的平均负载？

   有时我们会感觉到系统响应很慢，但是又找不到原因，这时要查平均负载，是否有大量的进程在排队等待
1,平均负载是什么?
         特定时间间隔内运行队列中的平均进程数，好象还不够明白：就是进程队列的长度，有多少个进程在排队等待运行
2,什么是"进程队列"?
         一个进程满足以下条件就会位于进程队列中
               1,它没有在等待I/O操作的结果
               2,它没有主动进入等待状态(即没有调用wait)
               3,它没有被停止
3,如何查看平均负载?
  最简单的命令是uptime
  例子：
  [lhd@localhost ~]$ uptime
  00:44:22 up  1:17,  3 users,  load average: 8.13, 5.90, 4,94
4,显示的内容是什么意思?
      load average: 8.13,5.90,4,94
      显示的是过去的1,5,15分钟内进程队列中的平均进程数量
5,如何衡量当前系统是否负载过高?
      如果每个cpu(可以按CPU核心的数量计算)上当前活动进程数不大于3，则系统性能良好，
      不大于4，表示可以接受
      如大于5，则系统性能问题严重
      上面例中的8.13,如果有2个cpu核心,则8.13/2=4.065, 此系统性能可以接受
      建议设置严格的报警值为: CPU核心的数量
      比如：CPU核心数量为2，则设置报警值为2
      (这样设置是合理的，因为毕竟不是每个应用都支持多CPU及多核心)
6,查看平均负载的命令
               有5个可用:
               tload 能够绘制出负载变化的图形
               uptime 同时显示开机以来的时间
               w       同时显示出已登录的用户
               top    这个对资源占用太高，不建议使用
               cat /proc/loadavg 通过/proc系统信息得到平均负载
注意：如果你要持续的观察平均负载，建议用 watch uptime 或 watch cat /proc/loadavg
备注：关于watch:每隔一定时间执行指定的程序，并全屏显示结果。时间默认是2秒

--------------------------------------------------------------------

uptime

这个uptime外壳命令产生下列输出：

[pax:~]% uptime
9:40am  up 9 days, 10:36,  4 users,  load average: 0.02, 0.01, 0.00

它显示自从上次系统重启以来，活动的用户进程数量和所谓的平均负荷指标（load average）。

procinfo

在Linux系统上，procinfo命令产生以下输出：

[pax:~]% procinfo
Linux 2.0.36 (root@pax) (gcc 2.7.2.3) #1 Wed Jul 25 21:40:16 EST 2001 [pax]

Memory:      Total        Used        Free      Shared     Buffers      Cached
Mem:         95564       90252        5312       31412       33104       26412
Swap:        68508           0       68508

Bootup: Sun Jul 21 15:21:15 2002    Load average: 0.15 0.03 0.01 2/58 8557
...

平均负载指标出现在这个输出的左下角。

w

w(ho)命令产生下列输出：

 [pax:~]% w

  9:40am  up 9 days, 10:35,  4 users,  load average: 0.02, 0.01, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU  WHAT
mir      ttyp0    :0.0             Fri10pm  3days  0.09s  0.09s  bash
neil     ttyp2    12-35-86-1.ea.co  9:40am  0.00s  0.29s  0.15s  w
...

请注意，第一行的输出与uptime命令的输出相同。

top

top命令是最近加入到UNIX命令集中的，它通过计算进程消耗CPU的时间来给进程排名。它产生下列输出：

  4:09am  up 12:48,  1 user,  load average: 0.02, 0.27, 0.17

58 processes: 57 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  0.5% user,  0.9% system,  0.0% nice, 98.5% idle
Mem:   95564K av,  78704K used,  16860K free,  32836K shrd,  40132K buff
Swap:  68508K av,      0K used,  68508K free                 14508K cched

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
 5909 neil      13   0   720  720   552 R       0  1.5  0.7   0:01 top
    1 root       0   0   396  396   328 S       0  0.0  0.4   0:02 init
    2 root       0   0     0    0     0 SW      0  0.0  0.0   0:00 kflushd
    3 root     -12 -12     0    0     0 SW<     0  0.0  0.0   0:00 kswapd
...

所有这些命令，请注意，输出中都有三个数字报告平均负载。相当普遍的是，这些数字显示出从左至右的降序。但是有时，又是升序排列，正如上面的输出。

另附：

1、Linux系统的平均负载的概念

　　有时候我们会觉得系统响应很慢，但是又找不到原因，这时就要查看平均负载了，看它是否有大量的进程在排队等待。特定时间间隔内运行队列中的平均进程数可以反映系统的繁忙程度，所以我们通常会在自己的网站或系统变慢时第一时间查系统的负载，即CPU的平均负载。

　　2、查看平均负载

　　究竟应该如何查看平均负载呢？最简单的命令是uptime，如下所示：

　　[root@localhost ～]# uptime 11：31：11 up 11 days， 19：01，　2 users，　load average： 0.02， 0.01， 0.00目前的主流服务器都是双四核，有相当强悍的CPU，做一般的应用服务的话，Linux系统的负载这块倒不用我们担心。

　　还可以用w命令来查看，顺便可以查看一下系统当前有哪些用户，他们占用了哪些终端，如下所示：

　　[root@localhost ～]# w 11：33：00 up 11 days， 19：03，　2 users，　load average： 0.00， 0.00， 0.00 USER TTY　FROM　LOGIN@　 IDLE　 JCPU　 PCPU WHAT root pts/1113.57.224.3　09：032：11m　0.04s　0.04s -bash root pts/2113.57.224.3　11：310.00s　0.02s　0.00s w另外，还有动态命令top，这个命令也可以反映系统负载情况。在下面的命令提示中，我们只关心加粗字体部分。

　　[root@localhost ～]# top top - 11：37：47 up 11 days， 19：08，　2 users，　load average： 0.00， 0.00， 0.00 Tasks： 122 total，　 1 running， 121 sleeping，　 0 stopped，　 0 zombie Cpu(s)：　0.1%us，　0.0%sy，　0.0%ni， 99.9%id，　0.0%wa，　0.0%hi，　0.0%si，　0.0%st Mem：　 4044136k total，　1435504k used，　2608632k free，　 274740k buffers Swap：　8193140k total，0k used，　8193140k free，　 941884k cached上面加粗字体显示的内容是什么意思呢？再通过uptime查看一下。

　　[root@localhost ～]# uptime 11：39：36 up 11 days， 19：16，　1 user，　load average： 0.09， 0.03， 0.01原来它所表示的是过去的1分钟、5分钟和15分钟内进程队列中的平均进程数量。

　　那么，如何衡量当前系统是否负载过高呢？可以从以下几点来考虑。

　　如果每个CPU(可以按CPU核心的数量计算)当前的活动进程数不大于3，则系统性能良好。

　　如果每个CPU当前的活动进程数不大于4，表示可以接受。

　　如果每个CPU当前的活动进程数大于5，则系统性能问题严重。

　　还可以结合vmstat命令来判断我们的系统是否过于繁忙，如果确定很繁忙的话，就要考虑是否更换服务器或增加CPU的个数了。总结如下：

　　如果r经常大于3或4，且id经常少于50，则表示CPU的负荷很重。

　　在上面例子中，我的服务器是PowerEdge 2850，CPU是双核双线程的，则0.09/2=0.045(即负载值/真实CPU个数)，此系统的CPU负载基本可以忽略了。事实上，现在主流服务器的CPU都很强悍，如果不是应用虚拟化等特殊场景，基本上负载都很小。

　　按照前面的计算公式，我所配置Nagios报警的CPU负载阈值为CPU核心的数量(即CPU的物理个数×核数)。还是以我的服务器PowerEdge 2850为例，其CPU核心的数量为2×2=4，则设置报警值为4.这样设置是合理的，因为毕竟不是每个应用服务器的CPU都支持多核心，毕竟整个网站中还有些性能比较弱的服务器是用来做备份的。

(附带 top 用法：）

TOP是一个动态显示过程,即可以通过用户按键来不断刷新当前状态.如果在前台执行该命令,它将独占前台,直到用户终止该程序为止.比较准确的说,top命令提供了实时的对系统处理器的状态监视.它将显示系统中CPU最“敏感”的任务列表.该命令可以按CPU使用.内存使用和执行时间对任务进行排序；而且该命令的很多特性都可以通过交互式命令或者在个人定制文件中进行设定.

top - 12:38:33 up 50 days, 23:15, 7 users, load average: 60.58, 61.14, 61.22

Tasks: 203 total, 60 running, 139 sleeping, 4 stopped, 0 zombie

Cpu(s) : 27.0%us, 73.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 1939780k total, 1375280k used, 564500k free, 109680k buffers

Swap: 4401800k total, 497456k used, 3904344k free, 848712k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4338 oracle 25 0 627m 209m 207m R 0 11.0 297:14.76 oracle

4267 oracle 25 0 626m 144m 143m R 6 7.6 89:16.62 oracle

3458 oracle 25 0 672m 133m 124m R 0 7.1 1283:08 oracle

3478 oracle 25 0 672m 124m 115m R 0 6.6 1272:30 oracle

3395 oracle 25 0 672m 122m 113m R 0 6.5 1270:03 oracle

3480 oracle 25 0 672m 122m 109m R 8 6.4 1274:13 oracle

3399 oracle 25 0 672m 121m 110m R 0 6.4 1279:37 oracle

4261 oracle 25 0 634m 100m 99m R 0 5.3 86:13.90 oracle

25737 oracle 25 0 632m 81m 74m R 0 4.3 272:35.42 oracle

7072 oracle 25 0 626m 72m 71m R 0 3.8 6:35.68 oracle

16073 oracle 25 0 630m 68m 63m R 8 3.6 175:20.36 oracle

16140 oracle 25 0 630m 66m 60m R 0 3.5 175:13.42 oracle

16122 oracle 25 0 630m 66m 60m R 0 3.5 176:47.73 oracle

786 oracle 25 0 627m 63m 63m R 0 3.4 1:54.93 oracle

4271 oracle 25 0 627m 59m 58m R 8 3.1 86:09.64 oracle

4273 oracle 25 0 627m 57m 56m R 8 3.0 84:38.20 oracle

22670 oracle 25 0 626m 50m 49m R 0 2.7 84:55.82 oracle

一. TOP前五行统计信息

统计信息区前五行是系统整体的统计信息。

1. 第一行是任务队列信息

同 uptime 命令的执行结果:

[root@localhost ~]# uptime

13:22:30 up 8 min, 4 users, load average: 0.14, 0.38, 0.25

其内容如下：

12:38:33	当前时间
up 50days	系统运行时间，格式为时:分
1 user	当前登录用户数
load average: 0.06, 0.60, 0.48	系统负载，即任务队列的平均长度。三个数值分别为 1分钟、5分钟、15分钟前到现在的平均值。

2. 第二、三行为进程和CPU的信息

当有多个CPU时，这些内容可能会超过两行。内容如下：

Tasks: 29 total	进程总数
1 running	正在运行的进程数
28 sleeping	睡眠的进程数
0 stopped	停止的进程数
0 zombie	僵尸进程数
Cpu(s): 0.3% us	用户空间占用CPU百分比
1.0% sy	内核空间占用CPU百分比
0.0% ni	用户进程空间内改变过优先级的进程占用CPU百分比
98.7% id	空闲CPU百分比
0.0% wa	等待输入输出的CPU时间百分比
0.0% hi
0.0% si

3. 第四五行为内存信息。

内容如下：

Mem: 191272k total	物理内存总量
173656k used	使用的物理内存总量
17616k free	空闲内存总量
22052k buffers	用作内核缓存的内存量
Swap: 192772k total	交换区总量
0k used	使用的交换区总量
192772k free	空闲交换区总量
123988k cached	缓冲的交换区总量。内存中的内容被换出到交换区，而后又被换入到内存，但使用过的交换区尚未被覆盖，该数值即为这些内容已存在于内存中的交换区的大小。相应的内存再次被换出时可不必再对交换区写入。

二. 进程信息

列名	含义
PID	进程id
PPID	父进程id
RUSER	Real user name
UID	进程所有者的用户id
USER	进程所有者的用户名
GROUP	进程所有者的组名
TTY	启动进程的终端名。不是从终端启动的进程则显示为 ?
PR	优先级
NI	nice值。负值表示高优先级，正值表示低优先级
P	最后使用的CPU，仅在多CPU环境下有意义
%CPU	上次更新到现在的CPU时间占用百分比
TIME	进程使用的CPU时间总计，单位秒
TIME+	进程使用的CPU时间总计，单位1/100秒
%MEM	进程使用的物理内存百分比
VIRT	进程使用的虚拟内存总量，单位kb。VIRT=SWAP+RES
SWAP	进程使用的虚拟内存中，被换出的大小，单位kb。
RES	进程使用的、未被换出的物理内存大小，单位kb。RES=CODE+DATA
CODE	可执行代码占用的物理内存大小，单位kb
DATA	可执行代码以外的部分(数据段+栈)占用的物理内存大小，单位kb
SHR	共享内存大小，单位kb
nFLT	页面错误次数
nDRT	最后一次写入到现在，被修改过的页面数。
S	进程状态。 D=不可中断的睡眠状态 R=运行 S=睡眠 T=跟踪/停止 Z=僵尸进程
COMMAND	命令名/命令行
WCHAN	若该进程在睡眠，则显示睡眠中的系统函数名
Flags	任务标志，参考 sched.h

top 的man 命令解释如下：

Listed below are top's available fields. They are always associated with the letter shown, regardless of the position you may have established for them with the 'o' (Order fields) interactive command.Any field is selectable as the sort field, and you control whether they are sorted high-to-low or low-to-high. For additional information on sort provisions see topic 3c. TASK Area Commands.

a: PID -- Process Id

The task's unique process ID, which periodically wraps, though never restarting at zero.

b: PPID -- Parent Process Pid

The process ID of a task's parent.

c: RUSER -- Real User Name

The real user name of the task's owner.

d: UID -- User Id

The effective user ID of the task's owner.

e: USER -- User Name

The effective user name of the task's owner.

f: GROUP -- Group Name

The effective group name of the task's owner.

g: TTY -- Controlling Tty

The name of the controlling terminal. This is usually the device (serial port, pty, etc.) from which the process was started, and which it uses for input oroutput. However, a task need not be associated with a terminal, in which case you'll see '?' displayed.

h: PR -- Priority

The priority of the task.

i: NI -- Nice value

The nice value of the task. A negative nice value means higher priority, whereas a positive nice value means lower priority. Zero in this field simply means priority will not be adjusted in determining a task's dispatchability.

j: P -- Last used CPU (SMP)

A number representing the last used processor. In a true SMP environment this will likely change frequently since the kernel intentionally uses weak affinity. Also, the very act of running top may break this weak affinity and cause more processes to change CPUs more often (because of the extra demand for cpu time).

k: %CPU -- CPU usage

The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time. In a true SMP environment, if 'Irix mode' is Off, top will operate in 'Solaris mode' where a task's cpu usage will be divided by the total number of CPUs. You toggle 'Irix/Solaris' modes with the 'I' interactive command.

l: TIME -- CPU Time

Total CPU time the task has used since it started. When 'Cumulative mode' is On, each process is listed with the cpu time that it and its dead children has used. You toggle 'Cumulative mode' with 'S', which is a command-line option and an interactive command. See the 'S' interactive command for additional information regarding this mode.

m: TIME+ -- CPU Time, hundredths

The same as 'TIME', but reflecting more granularity through hundredths of a sec ond.

n: %MEM -- Memory usage (RES)

A task's currently used share of available physical memory.

o: VIRT -- Virtual Image (kb)

The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out. (Note: you can define the STATSIZE=1 environment variable and the VIRT will be calculated from the /proc/#/state VmSize field.)

VIRT = SWAP + RES.

p: SWAP -- Swapped size (kb)

The swapped out portion of a task's total virtual memory p_w_picpath.

q: RES -- Resident size (kb)

The non-swapped physical memory a task has used.

RES = CODE + DATA.

r: CODE -- Code size (kb)

The amount of physical memory devoted to executable code, also known as the'text resident set' size or TRS.

s: DATA -- Data+Stack size (kb)

The amount of physical memory devoted to other than executable code, also known the 'data resident set' size or DRS.

t: SHR -- Shared Mem size (kb)

The amount of shared memory used by a task. It simply reflects memory that could be potentially shared with other processes.

u: nFLT -- Page Fault count

The number of major page faults that have occurred for a task. A page fault occurs when a process attempts to read from or write to a virtual page that is not currently present in its address space. A major page fault is when disk access is involved in making that page available.

v: nDRT -- Dirty Pages count

The number of pages that have been modified since they were last written to disk. Dirty pages must be written to disk before the corresponding physical memory location can be used for some other virtual page.

w: S -- Process Status

The status of the task which can be one of:

'D' = uninterruptible sleep

'R' = running

'S' = sleeping

'T' = traced or stopped

'Z' = zombie

Tasks shown as running should be more properly thought of as 'ready to run' --their task_struct is simply represented on the Linux run-queue. Even without a true SMP machine, you may see numerous tasks in this state depending on top's delay interval and nice value.

x: Command -- Command line or Program name

Display the command line used to start a task or the name of the associated program. You toggle between command line and name with 'c', which is both a command-line option and an interactive command. When you've chosen to display command lines, processes without a command line (like kernel threads) will be shown with only the program name in parentheses, as in this example: ( mdrecoveryd ) Either form of display is subject to potential truncation if it's too long to fit in this field's current width. That width depends upon other fields selected, their order and the current screen width.

Note: The 'Command' field/column is unique, in that it is not fixed-width. When displayed, this column will be allocated all remaining screen width (up to the maximum 512 characters) to provide for the potential growth of program names into command lines.

y: WCHAN -- Sleeping in Function

Depending on the availability of the kernel link map ('System.map'), this field will show the name or the address of the kernel function in which the task is currently sleeping. Running tasks will display a dash ('-') in this column.

Note: By displaying this field, top's own working set will be increased by over 700Kb. Your only means of reducing that overhead will be to stop and restart top.

z: Flags -- Task Flags

This column represents the task's current scheduling flags which are expressed in hexadecimal notation and with zeros suppressed. These flags are officially documented in <linux/sched.h>. Less formal documentation can also be found on the 'Fields select' and 'Order fields' screens.

默认情况下仅显示比较重要的 PID、USER、PR、NI、VIRT、RES、SHR、S、%CPU、%MEM、TIME+、COMMAND 列。

2.1 用快捷键更改显示内容。
（1）更改显示内容通过 f键可以选择显示的内容。

按 f 键之后会显示列的列表，按 a-z 即可显示或隐藏对应的列，最后按回车键确定。

（2）按o键可以改变列的显示顺序。

按小写的 a-z 可以将相应的列向右移动，而大写的 A-Z 可以将相应的列向左移动。最后按回车键确定。

按大写的 F 或 O 键，然后按 a-z 可以将进程按照相应的列进行排序。而大写的 R 键可以将当前的排序倒转。

设置完按回车返回界面。

三. 命令使用

详细内容可以参考MAN 帮助文档。这里列举部分内容：

命令格式：

top [-] [d] [p] [q] [c] [C] [S] [n]

参数说明：

d：指定每两次屏幕信息刷新之间的时间间隔。当然用户可以使用s交互命令来改变之。

p：通过指定监控进程ID来仅仅监控某个进程的状态。

q：该选项将使top没有任何延迟的进行刷新。如果调用程序有超级用户权限，那么top将以尽可能高的优先级运行。

S：指定累计模式

s ：使top命令在安全模式中运行。这将去除交互命令所带来的潜在危险。

i：使top不显示任何闲置或者僵死进程。

c：显示整个命令行而不只是显示命令名

在top命令的显示窗口，我们还可以输入以下字母，进行一些交互：

帮助文档如下：

Help for Interactive Commands - procps version 3.2.7

Window 1:Def: Cumulative mode Off. System: Delay 4.0 secs; Secure mode Off.

Z,B Global: 'Z' change color mappings; 'B' disable/enable bold

l,t,m Toggle Summaries: 'l' load avg; 't' task/cpu stats; 'm' mem info

1,I Toggle SMP view: '1' single/separate states; 'I' Irix/Solaris mode

f,o . Fields/Columns: 'f' add or remove; 'o' change display order

F or O . Select sort field

<,> . Move sort field: '<' next col left; '>' next col right

R,H . Toggle: 'R' normal/reverse sort; 'H' show threads

c,i,S . Toggle: 'c' cmd name/line; 'i' idle tasks; 'S' cumulative time

x,y . Toggle highlights: 'x' sort field; 'y' running tasks

z,b . Toggle: 'z' color/mono; 'b' bold/reverse (only if 'x' or 'y')

u . Show specific user only

n or # . Set maximum tasks displayed

k,r Manipulate tasks: 'k' kill; 'r' renice

d or s Set update interval

W Write configuration file

q Quit

( commands shown with '.' require a visible task display window )

Press 'h' or '?' for help with Windows,

h或者? : 显示帮助画面，给出一些简短的命令总结说明。

k ：终止一个进程。系统将提示用户输入需要终止的进程PID，以及需要发送给该进程什么样的信号。一般的终止进程可以使用15信号；如果不能正常结束那就使用信号9强制结束该进程。默认值是信号15。在安全模式中此命令被屏蔽。

i：忽略闲置和僵死进程。这是一个开关式命令。

q：退出程序。

r：重新安排一个进程的优先级别。系统提示用户输入需要改变的进程PID以及需要设置的进程优先级值。输入一个正值将使优先级降低，反之则可以使该进程拥有更高的优先权。默认值是10。

S：切换到累计模式。

s : 改变两次刷新之间的延迟时间。系统将提示用户输入新的时间，单位为s。如果有小数，就换算成ms。输入0值则系统将不断刷新，默认值是5 s。需要注意的是如果设置太小的时间，很可能会引起不断刷新，从而根本来不及看清显示的情况，而且系统负载也会大大增加。

f或者F :从当前显示中添加或者删除项目。

o或者O :改变显示项目的顺序。

l: 切换显示平均负载和启动时间信息。即显示影藏第一行

m：切换显示内存信息。即显示影藏内存行

t ：切换显示进程和CPU状态信息。即显示影藏CPU行

c：切换显示命令名称和完整命令行。显示完整的命令。这个功能很有用。

M ：根据驻留内存大小进行排序。

P：根据CPU使用百分比大小进行排序。

T：根据时间/累计时间进行排序。

W：将当前设置写入~/.toprc文件中。这是写top配置文件的推荐方法。

转载于:https://blog.51cto.com/yamig/1110741

weixin_34177064

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

linux 系统平均负载 查看

uptime

procinfo

w

top

linux 系统平均负载查看