理解“平均负载”

最新推荐文章于 2022-05-21 03:20:04 发布

一笑杯莫停

最新推荐文章于 2022-05-21 03:20:04 发布

阅读量224

点赞数

文章标签：操作系统 linux

本文链接：https://blog.csdn.net/merryxuan/article/details/108151955

版权

前言

平均负载

什么是平均负载（Load Averages）

前言

发现系统变慢的时候我们通常会用两个命令来查看系统资源使用情况 top或者uptime

输入uptime时


$ uptime
02:34:03 up 2 days, 20:14,  1 user,  load average: 0.63, 0.83, 0.88

这一行信息是什么意思呢？


02:34:03              //当前时间
up 2 days, 20:14      //系统运行时间
1 user                //正在登录用户数
load average: 0.63, 0.83, 0.88 //1 分钟、5 分钟、15 分钟的平均负载

平均负载

什么是平均负载（Load Averages）

使用man uptime命令得到的解释是这样的

       uptime - Tell how long the system has been running.

SYNOPSIS
       uptime [options]

DESCRIPTION
       uptime gives a one line display of the following information.  The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.

       This is the same information contained in the header line displayed by w(1).

       System  load  averages  is the average number of processes that are either in a runnable or uninterruptable state.  A process in a runnable state is either using the CPU or waiting to use the CPU.  A process in uninterruptable state is waiting for some I/O
       access, eg waiting for disk.  The averages are taken over the three time intervals.  Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a  4  CPU  system  it
       means it was idle 75% of the time.

即说明了uptime命令返回信息的含义也解释了什么是平均负载。平均负载的描述大致是：处于可运行状态和不可中断状态的平均进程数。处于Runnable状态的进程要么正在使用CPU，要么等待使用CPU；处于uninterruptable状态的进程正在等待IO访问，例如磁盘IO。取了三个时间段的平均值（过去1分钟，5分钟，15分钟的平均负载）。没有针对系统中的CPU数量做标准化，平均负载为1意味着，单核CPU系统一直被加载。然而在四核CPU系统表示有75%是空闲时间。

可运行状态（Runnable）：想必很熟悉，意思就是准备就绪等待操作系统调度
不可中断（Uninterruptable）：也很明确，例如一个正在进行IO操作的进程，为了保证数据的一致性，是不可以被打断的。

因此，平均负载就是平均活跃进程数也可以理解为单位时间内活跃进程数，也叫活跃进程的指数衰减平均值。

平均负载合理值

上面那一大段英文虽然我怕翻译的可能不是很精确，但意思很明确，理想情况是平均负载数等于CPU数量。那么怎么获取CPU数量呢？

通过/proc/cpuinfo文件获取，也可以通过top命令获取


# 关于grep和wc的用法请查询它们的手册或者网络搜索
$ grep 'model name' /proc/cpuinfo | wc -l
2

如果平均负载大于CPU数量，那么很显然就出现了过载，但是平均负载有三个。事实上这三个平均负载是给我们提供了三个数据样本，让我们了解最近15分钟的平均负载情况。那么这三个值怎么解读呢？

平均负载值的解读

如果三个值相差不大，说明过去15分钟内平均负载都很平稳。
如果15分钟比1分钟的值大很多，说明平均负载在最近15分钟内在减小。且，过去15分钟内有过较大的负载。
反之，如果 1 分钟的值远大于 15 分钟的值，就说明最近 1 分钟的负载在增加，这种增加有可能只是临时性的，也有可能还会持续增加下去，所以就需要持续观察。一旦 1 分钟的平均负载接近或超过了 CPU 的个数，就意味着系统正在发生过载的问题，这时就得分析调查是哪里导致的问题，并要想办法优化了。

建议，当平均负载高于 CPU 数量 70% 的时候，就应该分析排查负载高的问题了。一旦负载过高，就可能导致进程响应变慢，进而影响服务的正常功能。当然70%不是绝对的。