上一篇博客说Perf_event_open遇到了问题,获得的计数器不准。
我分析原因,可能是性能计数器是一个核中的硬件,它只能计数在该核中运行的程序。而在多核处理器中,任务调度可能会将我需要计数的程序分配给其他的核,而这种分配是动态的,所以每次得到的不一样。这可能是原因之一。
很凑巧,我误打误撞竟然得到了较为正确的做法。在官方手册中说明。
The pid and cpu arguments allow specifying which process and CPU to
monitor:
pid == 0 and cpu == -1
This measures the calling process/thread on any CPU.
pid == 0 and cpu >= 0
This measures the calling process/thread only when running on
the specified CPU.
pid > 0 and cpu == -1
This measures the specified process/thread on any CPU.
pid > 0 and cpu >= 0
This measures the specified process/thread only when running
on the specified CPU.
pid == -1 and cpu >= 0
This measures all processes/threads on the specified CPU.
This requires CAP_SYS_ADMIN capability or a
/proc/sys/kernel/perf_event_paranoid value of less than 1.
pid == -1 and cpu == -1
This setting is invalid and will return an error.
When pid is greater than zero, permission to perform this system call
is governed by a ptrace access mode PTRACE_MODE_READ_REALCREDS check;
see ptrace(2).
我将pid == 0和cpu >=0,再按照CPU 多设置几组并行统计,能得到较为准确的值。