这次对照内核源码以及intel sdm手册,对perf stat统计项做一次详细的梳理。力图做到权威,减少理解的偏差。
硬件事件,最终会落实到cpu pmu来统计。
这些事件属于perf_event_open()接口中的PERF_TYPE_HARDWARE大类。
选项 | 编码(event/umask) | Intel SDM 解释 | 补充说明 |
---|---|---|---|
cycles | 0x3c, 0x00 | Counts core clock cycles whenever the logical processor is in C0 state (not halted). The frequency of this event varies with state transitions in the core. | |
instructions | 0xc0, 0x00 | Counts when the last uop of an instruction retires. | |
cache-references | 0x2e, 0x4f | Accesses to the LLC, in which the data is present (hit) or not present (miss). | |
cache-misses | 0x2e, 0x41 | Accesses to the LLC in which the data is not present (miss). | |
branches | 0xc4, 0x00 | Counts when the last uop of a branch instruction retires. | |
branch-misses | 0xc5, 0x00 | Counts when the last uop of a branch instruction retires which corrected misprediction of the branch prediction hardware at execution time. | |
bus-cycles | 0x3c, 0x01 | Counts at a fixed frequency whenever the logical processor is in C0 state (not halted).Current implementations count at core crystal clock, TSC, or bus clock frequency. | |
stalled-cycles-frontend | 与微架构相关 | Increments each cycle the # of Uops issued by the RAT to RS. Set Cmask = 1, Inv = 1, Any= 1 to count stalled cycles of this core. | |
stalled-cycles-backend | 与微架构相关 | Counts total number of uops to be executed per- thread each cycle. Set Cmask = 1, INV =1 to count stall cycles. | |
ref-cycles | 0x00, 0x30 | This event counts the number of reference core cpu cycles. Reference means that the event increments at a constant rate which is not subject to core CPU frequency adjustments. The event may not count when the processor is in halted (low power) state. As such, it may not be equivalent to wall clock time. However, when the processor is not halted state, the event keeps a constant correlation with wall clock time. |
cache硬件事件,最终会落实到cpu pmu来统计。
这些事件属于perf_event_open()接口中的PERF_TYPE_HW_CACHE事件大类。再配合cache_type, cache_op, cache_result属性来拼凑成不同统计值。
由于cache_type, cache_op, cache_result是跟微架构相关的,linux内核利用hw_cache_event_ids和hw_cache_extra_regs 数组来保存特定体系结构对应的值。内核根据CPU微架构区分赋值可以参考。对着intel SDM阅读后,感觉不能完全相信perf_event_open的man手册,统计项根据cpu微架构不同而有区别。
Skylake微架构:
选项 | 编码(event/umask) | Intel SDM 分类 | Intel SDM 解释 | 补充说明 |
---|---|---|---|---|
L1-dcache-loads | 0xd0,0x81 | MEM_INST_RETIRED.ALL_LOADS | All retired load instructions. | |
L1-dcache-loads-misses | 0x51,0x01 | L1D.REPLACEMENT | Counts the number of lines brought into the L1 data cache. | |
L1-dcache-stores | 0xd0,0x82 | MEM_INST_RETIRED.ALL_STOR ES | All retired store instructions. | |
L1-icache-loads-misses | 0x83,0x02 | ICACHE_64B.IFTAG_MISS | Instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity. | |
LLC-loads | 0xb7,0x01 | |||
LLC-load-misses | 0xb7,0x01 |