【linux】查看系统cpu信息及使用telegraf监控系统负载状况

最新推荐文章于 2024-05-29 00:28:07 发布

Alexbyy

最新推荐文章于 2024-05-29 00:28:07 发布

阅读量1.5k

点赞数

分类专栏：运维文章标签：监控 cpu 运维 telegraf lma

本文链接：https://blog.csdn.net/Alexbyy/article/details/108992362

版权

运维专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一、cpu负载和利用率的区别

查看系统cpu状况常用的两个指标时负载和利用率，这两个指标是不一样的，负载高不一定利用率高，利用率高负载也不一定高，两者无必然关系，下面是一个通俗易懂的例子：
拿打电话来说明两者的区别，我按自己的理解阐述一下。某公用电话亭，有一个人在打电话，四个人在等待，每人限定使用电话一分钟，若有人一分钟之内没有打完电话，只能挂掉电话去排队，等待下一轮。电话在这里就相当于CPU，而正在或等待打电话的人就相当于任务数。在电话亭使用过程中，肯定会有人打完电话走掉，有人没有打完电话而选择重新排队，更会有新增的人在这儿排队，这个人数的变化就相当于任务数的增减。为了统计平均负载情况，我们5秒钟统计一次人数，并在第1、5、15分钟的时候对统计情况取平均值，从而形成第1、5、15分钟的平均负载。有的人拿起电话就打，一直打完1分钟，而有的人可能前三十秒在找电话号码，或者在犹豫要不要打，后三十秒才真正在打电话。如果把电话看作CPU，人数看作任务，我们就说前一个人（任务）的CPU利用率高，后一个人（任务）的CPU利用率低。当然， CPU并不会在前三十秒工作，后三十秒歇着，只是说，有的程序涉及到大量的计算，所以CPU利用率就高，而有的程序牵涉到计算的部分很少，CPU利用率自然就低。但无论CPU的利用率是高是低，跟后面有多少任务在排队没有必然关系。

二、查看系统cpu信息的方式

1、知识点

cpu信息记录在/proc/cpuinfo中
linux中的top命令可以查询cpu信息，相当于windows的任务管理器
cpu总核数 = 物理cpu个数 * 每颗物理cpu的核数
总逻辑cpu数 = 物理cpu个数 * 每颗物理cpu的核数 * 超线程数

2、查询方式

（1）通过/proc/cpuinfo 中记录的内容查询

cat /proc/cpuinfo | grep name | sort | uniq //查看cpu型号
cat /proc/cpuinfo | grep "physical id"  //查看物理cpu数目，有多少各不同的physical id就代表有几个物理cpu
cat /proc/cpuinfo| grep "cpu cores"| uniq //查看每个物理cpu的核数
cat /proc/cpuinfo| grep "processor"| wc -l // 查看逻辑cpu数目

（2）通过lscpu直接查看，cpus表示逻辑cpu数量，threads代表超线程数，core代表核数，Socket代表物理cpu数。
在这里插入图片描述
（3）top查看cpu负载和利用率情况
cpu负载要结合逻辑cpu数量来分析
待完善

三、监控系统cpu负载状况

监控系统cpu负载情况可以使用telegraf 的cpu插件.
The CPU plugin collects standard CPU metrics as defined in man proc. All architectures do not support all of these metrics.
CPU插件收集man proc中定义的标准CPU指标。并非所有的体系结构都支持所有这些度量。

cpu  3357 0 4313 1362393
    The amount of time, measured in units of USER_HZ (1/100ths of a second on
    most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value),
    that the system spent in various states:

    user   (1) Time spent in user mode.

    nice   (2) Time spent in user mode with low priority (nice).

    system (3) Time spent in system mode.

    idle   (4) Time spent in the idle task.  This value should be USER_HZ times
    the second entry in the /proc/uptime pseudo-file.

    iowait (since Linux 2.5.41)
           (5) Time waiting for I/O to complete.

    irq (since Linux 2.6.0-test4)
           (6) Time servicing interrupts.

    softirq (since Linux 2.6.0-test4)
           (7) Time servicing softirqs.

    steal (since Linux 2.6.11)
           (8) Stolen time, which is the time spent in other operating systems
           when running in a virtualized environment

    guest (since Linux 2.6.24)
           (9) Time spent running a virtual CPU for guest operating systems
           under the control of the Linux kernel.

    guest_nice (since Linux 2.6.33)
           (10) Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel).

配置文件

[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = false
  ## If true, compute and report the sum of all non-idle CPU states.
  report_active = false