Linux CPU占用率原理与精确度分析

最新推荐文章于 2024-05-28 14:01:18 发布

lmjssjj

最新推荐文章于 2024-05-28 14:01:18 发布

阅读量5.9k

点赞数 2

分类专栏： linux 文章标签： linux cpu

linux 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1 CPU占用率计算原理

1.1 相关概念

在Linux/Unix 下，CPU 利用率分为用户态、系统态和空闲态，  分别表示CPU 处于

用户态执行的时间，系统内核执行的时间，和空闲系统进程执行的时间。
下面是几个与CPU 占用率相关的概念

CPU利用率

CPU 的使用情况。

用户时间(User time)

表示CPU 执行用户进程的时间，包括nices时间。通常期望用户空间CPU 越高

越好。

系统时间(System time)

表示CPU 在内核运行时间，包括IRQ 和softirq 时间。系统CPU 占用率高，表明

系统某部分存在瓶颈。通常值越低越好。

等待时间(Waiting time)

CPI 在等待I/O 操作完成所花费的时间。系统部应该花费大量时间来等待I/O 操

作，否则就说明I/O 存在瓶颈。

空闲时间(Idle time)

 系统处于空闲期，等待进程运行。

Nice时间(Nice time)

 系统调整进程优先级所花费的时间。

硬中断处理时间(Hard Irq time)

 系统处理硬中断所花费的时间。

软中断处理时间(SoftIrq time)

 系统处理软中断中断所花费的时间。

丢失时间(Steal time)

 被强制等待（involuntary wait ）虚拟 CPU 的时间，此时 hypervisor 在为另一个虚拟处理器服务。

下面是我们在top 命令看到的CPU 占用率信息及各项值含义。
Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.2%id, 0.5%wa, 0.0%hi, 0.0%si,
0.0%st
us: User time
sy: System time
ni: Nice time
id: Idle time
wa: Waiting time
hi: Hard Irq time
si: SoftIrq time
st: Steal time

1.2 CPU 占用率计算

L inux CPU 占用率计算，都是根据/proc/stat 文件内容计算而来，下面是stat

文件内容样例，内核版本不同，会稍有不同，但内容基本一致。

CPU 信息，cpu 为总的信息，cpu0 … cpun 为各个具体CPU 信息
cpu 661733 468 503925 233055573 548835 14244 15849 0

 上面共有8 个值（单位：ticks），分别为： 
 User time ， 661733                  Nice time ， 468 
 System time ， 503925              Idle time ，233055573 
 Waiting time，548835               Hard   Irq time ， 14244 
 SoftIRQ time，15849                Steal time，0 

 CPU占用率计算公式如下：     
 CPU 时间=user+system+nice+idle+iowait+irq+softirq+Stl 
 %us =(User time + Nice time)/CPU时间*100% 
 %sy=(System time + Hard Irq time +SoftIRQ time)/CPU时间*100% 
 %id=(Idle time)/CPU 时间*100%

%ni=(Nice time)/CPU 时间*100%
%wa=(Waiting time)/CPU时间*100%
%hi=(Hard Irq time)/CPU 时间*100%
%si=(SoftIRQ time)/CPU时间*100%
%st=(Steal time)/CPU时间*100%

2 CPU占用率内核实现

下面以RHEL6 内核源码版本2.6.32-220.el6 x86_64 为例，来介绍内核源码实现。
/proc/stat 文件的创建由函数proc_stat_init（）实现，在文件 fs/proc/stat.c 中，在内核
初始化时调用。/proc/stat 文件相关函数时间均在stat.c 文件中。
对/proc/stat 文件的读写方法为proc_stat_operations。

static const struct file_operations  proc_stat_operations = { 
  .open =  stat_open,  
  .read =  seq_read ,  
  .llseek =  seq_lseek ,  
  .release =  single_release,  
};

打开文件函数stat_open（），函数首先申请大小为size 的内存，来存放临时数据
（也是我们看到的stat 里的最终数据）。

 static int s t at_ ope n( struct inode *inode ,  struct file  *file )  
 { 
  unsigned size = 4096  *  ( 1 + num_possible_cpus () /  32) ;  
  char * buf ;  
  struct seq_file *m ;  
  int  res ;  

  / * don't ask for more than the kma lloc() max size, currently 128 KB */ 
  if  ( size  > 128 *  1024)  
  size  = 128 *  1024;  
  buf  = kmalloc ( size ,  GFP_KERNEL );  
  if  ( !  buf)  
  return  -  ENOMEM;  

  res  = single_open( file ,  show_stat,  NULL);  
  if  ( !  res) { 
  m = file - >private_data ;  
  m- >buf  = buf;  
  m- >size  = size ;  
  }  else 
  kfree (buf);  
  return  res;  
 }  ?  end stat_open ?

/proc/stat 文件的数据由show_stat（）函数填充。注意42行for_each_possible_cpu(i)
循环，是累加计算所有 CPU 的数据，如我们前面的示例看到的/proc/stat 文件中第一行
cpu 值。
cpu 661733 468 503925 233055573 548835 14244 15849 0

 static int show_st a t( struct seq_file *p,  void * v)  
 { 
  int  i,  j ;  
  unsigned long jif;  
  cputime64_t  user,  nice,  system ,  idle ,  iowait,  irq ,  softirq, 
steal;  
  cputime64_t  guest ;  
  u64  sum = 0;  
  u64  sum_softirq = 0;  
  unsigned int  per_softirq_sums [ NR_SOFTIRQS]  = {0};  
  struct timespec boottime;  

  user = nice = system = idle  = iowait  =  
  irq  = softirq = steal = cputime64_zero;  
  guest  = cputime64_zero;  
  getboottime( &boottime );  
  jif = boottime .tv_sec;  

  for_each_possible_cpu ( i ) { 
  user = cputime64_add (user ,  kstat_cpu ( i ) .cpustat.user ) ;  
  nice = cputime64_add (nice ,  kstat_cpu ( i ) .cpustat.nice ) ;  
  system = cputime64_add (system,     
      kstat_cpu ( i ).cpustat.system ) ;  
  idle  = cputime64_add (idle ,  kstat_cpu ( i ) .cpustat.idle ) ;  
  idle  = cputime64_add (idle ,  arch_idle_time( i) ) ;  
  iowait  = cputime64_add (iowait ,     
       kstat_cpu ( i ).cpustat.iowait ) ;  
  irq  = cputime64_add (irq,  kstat_cpu ( i ) .cpustat.irq ); 
  softirq = cputime64_add (softirq ,  
       kstat_cpu ( i ).cpustat.softirq ) ;  
  steal = cputime64_add (steal,  kstat_cpu ( i ).cpustat.steal ) ;  
  guest  = cputime64_add (guest,     
       kstat_cpu ( i ).cpustat.guest) ;  
  sum +=  kstat_cpu_irqs_sum( i);  
  sum +=  arch_irq_stat_cpu( i );  

  for  ( j  = 0;  j  < NR_SOFTIRQS;  j++) { 
  unsigned int  softirq_stat = kstat_softirqs_cpu( j,  i );  

  per_softirq_sums [ j]  +=  softirq_stat ;  
  sum_softirq  +=  softirq_stat ;  
  } 
  } 
  sum +=  arch_irq_stat (); 

  seq_printf(p ,    
   "cpu    %llu %llu %llu %llu %l lu %llu %llu %llu %llun" ,  
  ( unsigned long long) cputime64_to_clock_t( user ),  
  ( unsigned long long) cputime64_to_clock_t( nice ),  
  ( unsigned long long) cputime64_to_clock_t( system),  
  ( unsigned long long) cputime64_to_clock_t( idle ),  
  ( unsigned long long) cputime64_to_clock_t( iowait ),  
  ( unsigned long long) cputime64_to_clock_t( irq),  
  ( unsigned long long) cputime64_to_clock_t( softirq ),  
  ( unsigned long long) cputime64_to_clock_t( steal),  
  ( unsigned long long) cputime64_to_clock_t( guest) ) ;

计算总的CPU 各个值user 、nice 、system、idle 、iowait 、irq、softirq 、steal后，
就分别计算各个CPU 的使用情况（75~100行）。

 for_each_online_cpu ( i ) { 

  / * Copy values here to work around gcc- 2.95.3, gcc- 2.96 */ 
  user = kstat_cpu ( i ) .cpustat.user ;  
  nice = kstat_cpu ( i ) .cpustat.nice ;  
  system = kstat_cpu ( i ) .cpustat.system ;  
  idle  = kstat_cpu ( i ) .cpustat.idle ; 
  idle  = cputime64_add (idle ,  arch_idle_time( i) ) ;  
  iowait  = kstat_cpu ( i ).cpustat.iowait ;  
  irq  = kstat_cpu ( i ) .cpustat.irq ;  
  softirq = kstat_cpu ( i ) .cpustat.softirq ;  
  steal = kstat_cpu ( i ) .cpustat.steal ;  
  guest  = kstat_cpu ( i ).cpustat.guest;  
  seq_printf( p ,  
              "cpu%d %llu %llu %llu %llu % llu %llu %llu %llu %llun" , 
  i ,  
  ( unsigned long long) cputime64_to_clock_t( user ),  
  ( unsigned long long) cputime64_to_clock_t( nice ),  
  ( unsigned long long) cputime64_to_clock_t( system),  
  ( unsigned long long) cputime64_to_clock_t( idle ),  
  ( unsigned long long) cputime64_to_clock_t( iowait ),  
  ( unsigned long long) cputime64_to_clock_t( irq),  
  ( unsigned long long) cputime64_to_clock_t( softirq ),  
  ( unsigned long long) cputime64_to_clock_t( steal),  
  ( unsigned long long) cputime64_to_clock_t( guest) ) ; 
  }

104 行计算所有CPU 上中断次数，104~105行计算CPU 上每个中断向量的
中断次数。注意：/proc/stat 文件中，将所有可能的 NR_IRQS个中断向量计数
都记录下来，但我们的机器上通过只是用少量的中断向量，这就是看到/proc/stat
文件中，intr 一行后面很多值为0 的原因。
show_stat （）函数最后获取进程切换次数nctxt、内核启动的时间btime、
所有创建的进程processes、正在运行进程的数量 procs_running、阻塞的进程数
量procs_blocked和所有 io 等待的进程数量。

  seq_printf(p ,  "intr %llu" , (unsigned long long) sum ); 

  / * sum again ? it could be updated? */ 
  for_each_irq_nr( j)  
  seq_printf(p ,  " %u" ,  kstat_irqs( j) ) ;  

  seq_printf( p ,  
  "nctxt %llun" 
  "btime %lun" 
  "processes %lun" 
  "procs_running %lun" 
  "procs_blocked %lun",  
  nr_context_switches(), 
  ( unsigned long) jif ,  
  total_forks  ,  
  nr_running(), 
  nr_iowait () ) ;  

  seq_printf(p ,  "softirq %llu" , (unsigned long long) sum_softirq);  

  for  ( i  = 0;  i  < NR_SOFTIRQS;  i++)  
  seq_printf( p ,  " %u" ,  per_softirq_sums [ i ]); 
  seq_printf( p ,  "n" );  

  return  0; 
 }  ?  end show_stat ?

3 Linux CPU占用率精确性分析

在使用类似top 命令，观察系统及各进程CPU 占用率时，可以指定刷新时间间隔，
以及时刷新和实时观察CPU 占用率。
top 命令默认情况下，是每 3 秒刷新一次。也可以通过 top -d <刷新时间间隔> 来
指定刷新频率，如top -d 0.1 或top -d 0.01 等。top 执行时，也可以按“s ”键，修改
时间间隔。
我们可以将CPU 占用率刷新间隔设置很低，如0.01 秒。但过低的刷新频率是否能
够更准确观察到CPU 占用率？Linux 系统提供的CPU 占用率信息是否足够精确？
根据前面分析，我们已知 Linux 是根据/proc/stat 文件的内容来计算CPU 占用率，也
就是精确度和/proc/stat 提供的数据精确度有关。那么
（1）/proc/stat 文件中的内容单位是什么？
（2）多久会刷新/proc/stat 中的数据？
cpu 926 0 4160 5894903 2028 0 7 0 0
cpu0 80 0 473 367723 658 0 3 0 0

3.1 /proc/stat中的数据单位精度

/proc/stat 中CPU 数据信息，单位是ticks。内核中有个全局变量jiffies ，来记录系
统启动以来，经历的ticks 数量。
cpu1 13 0 200 368639 63 0 0 0 0

ticks（滴答）就是系统时钟中断的时间间隔，该值与内核中HZ值有关，即ticks =
1/HZ。HZ值的大小，在内核编译时可配置的。某台机器上是RHEL6.1 内核，配置的
HZ值为1000。
[root@ssd boot]# uname -a
Linux ssd 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64
GNU/Linux
[root@ssd boot]# cat config-2.6.32-131.0.15.el6.x86_64 |grep CONFIG_HZ

CONFIG_HZ_100 is not set

CONFIG_HZ_250 is not set

CONFIG_HZ_300 is not set

CONFIG_HZ_1000=y
CONFIG_HZ=1000
[root@ssd boot]#

HZ 的值，就是每秒的时钟中断数量。可以观察/proc/interrupts 中时钟中断值变化，
来计算HZ的值。当HZ的值为1000时，ticks 的单位即为1/1000秒，即1ms 。
E very 5.0s: cat /proc/interrupts |grep LOC
Tue May 15 15:54:22 2012
LOC: 1621246 308599 28013 16995 37126 95699
1159285 2399641 552961 63923 58053
20580 17037 49626 1004223 48133 Local timer interrupts

3.2 CPU利用率统计信息更新

 在时钟中断程序中，更新CPU 利用信息，即每个ticks 更新一次。

include/linux/kernel_stat.h 中，有相应函数接口，专门用来更新CPU 利用率信息。如
account_user_time（）是更新用户态CPU 信息。

 / * 
 * Lock/ unlock the current runqueue -  to extract task statistics:  
 */  
 extern unsigned long long  t a s k _ de lt a _ e xe c( struct  
task_struct   *);  

 extern void acc ount _ us e r _ t im e( struct  task_struct   *,   
cputime_t,   cputime_t);  
 extern void acc ount _ s y s t em _ t im e( struct  task_struct  *,   
int ,   cputime_t,  
  cputime_t);  
 extern void acc ount _ s t e a l_ tim e(cputime_t);  
 extern void acc ount _ idle _ t i m e(cputime_t);  

 extern void acc ount _ pr oc e s s _ t ic k(struct  task_struct   *, 
int   user);  
 extern void acc ount _ s t e a l_ tic k s( unsigned long ticks);  
 extern void acc ount _ idle _ t i c k s( unsigned long ticks);  

   在内核中有一个per CPU 变量kernel_stat ，专门用来记录 CPU 利用信息。其定义在
include/linux/kernel_stat.h 中。 
  DEC LA R E_ PER _ C PU( struct  kernel_stat ,   kstat);  

 #define  kstat_cpu ( cpu )  per_cpu( kstat,  cpu )

每次时钟中断时（ticks ），就会更新kernel_stat 变量中各个成员变量的值。/proc/stat
文件中的值，都是在程序读取时更新，内核并不会主动更新/proc/stat 中的数据。
/proc/stat 中的CPU 信息是通过kernel_stat 各个成员变量的值计算而来。

3.3 CPU利用率精确性分析

 通过前面分析，我们可以得出以下结论：

（1）Linux CPU 占用率是根据/proc/stat 文件中的数据计算而来；
（2）/proc/stat 中的数据精度为ticks ，即1/HZ秒；
（3）内核每个ticks 会更新一次CPU 使用信息；
（4）CPU 占用率的精度为1/HZ秒。

4 Linux CPU占用率是否准确？

有时偶尔会遇到类似问题：在稳定计算压力下，进程CPU 占用率不稳定；或者特性
进程CPU 占用率明显不准。即在系统切换次数很高时，Linux 的CPU 利用率计算机制可
能不准确。
那么Linux 的CPU 利用率计算到底是否准确？若可能不准确，则什么情况下出现这
种情况？

4.1 Linux CPU 占用率不准确情形

在前面分析中，Linux 内核是在每次时钟中断时更新CPU 使用情况，即 1/HZ秒更新
一次。时钟中断时，只会看到当前正在运行的进程信息。以下图为例，红色箭头表示时
钟中断（Timer Interrupt ）。
第一次中断时，看到进程A 在运行。但进程 A 运行时间短，进程 B 运行。第二次中
断时，进程 C 运行；在第三次中断到来时，再次调度进程 A 执行。第三次此中断时，进
程C 运行。
按照Linux 内核CPU 占用率统计方法，在第1 次和第2 次中断期间，内核并没有看
到进程B 在运行；于是就漏掉了进程B 使用CPU 的信息。同样道理，在第2 次和第3
次中断期间，漏掉了进程B 使用CPU 的情况。这样，就导致了Linux 内核CPU 占用率
统计不准确。
发生CPU占用率不准确的原因是：在一个时钟中断周期内，发生了多次进程调度。
时钟中断的精度是1/HZ秒。

4.2 top 命令CPU使用率准确吗？

只有在一个时钟中断周期内发生多次进程调度，才会出现CPU 占用率不准的情况。
那么top 命令中CPU 使用率是否准确与进程调度频率有关。
若HZ的值为250 ，则 ticks 值为4ms ；若 HZ值为1000，则 ticks 值为1ms 。在 HZ
为250 时，只要进程的调度间隔大于4ms ，CPU 占用率就准确。HZ为1000时，调度
间隔大于1ms ，CPU 占用率计算就准确。
进程调度次数少，CPU占用率就准确；调度时间间隔小于时钟中断，就可能不准确。
那么进程调度的时机是怎样的？如何观察进程调度次数？

4.2.1 进程调度时机

? 进程状态转换的时刻：进程终止、进程睡眠
进程要调用sleep（）或 exit （）等函数进行状态转换，这些函数会主动调用调度程
序进行进程调度；
? 当前进程的时间片用完时（current->counter=0 ）
由于进程的时间片是由时钟中断来更新的
? 设备驱动程序
当设备驱动程序执行长而重复的任务时，直接调用调度程序。在每次反复循环中，
驱动程序都检查need_resched的值，如果必要，则调用调度程序schedule() 主动放弃
CPU 。
? 进程从中断、异常及系统调用返回到用户态时
不管是从中断、异常还是系统调用返回，最终都调用ret_from_sys_call （），由这
个函数进行调度标志的检测，如果必要，则调用调度程序。那么，为什么从系统调用返
回时要调用调度程序呢？这当然是从效率考虑。从系统调用返回意味着要离开内核态而
返回到用户态，而状态的转换要花费一定的时间，因此，在返回到用户态前，系统把在
内核态该处理的事全部做完。

4.2.2 进程调度次数观察

可以通过vmstat 命令，来观察系统中进程切换次数，cs 域的值就是切换次数。HZ
的值，可以通过内核配置文件来确定，若/proc/config.gz 存在，导出这个文件查看即可。

也可以通过查看/proc/sched_debug 文件内容，来观察切换次数（nr_switches）。
[root@ssd proc]# watch -d -n 1 ‘cat /proc/sched_debug |grep nr_switches’

 我们系统中的进程调度真的那么频繁吗？大多数情况下，Linux 中的CPU 占用率计

算机制是准确的。

lmjssjj

关注

2
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
Linux CPU占用率原理与精确度分析

1 CPU占用率计算原理1.1 相关概念在Linux/Unix 下，CPU 利用率分为用户态、系统态和空闲态，分别表示CPU 处于用户态执行的时间，系统内核执行的时间，和空闲系统进程执行的时间。下面是几个与CPU 占用率相关的概念CPU利用率CPU 的使用情况。用户时间(User time)表示CPU 执行用户进程的时间，包括nices时间。通常期望用户空间CPU
复制链接

扫一扫