perf stat 输出详解

这次对照内核源码以及intel sdm手册,对perf stat统计项做一次详细的梳理。力图做到权威,减少理解的偏差。

硬件事件,最终会落实到cpu pmu来统计。

在这里插入图片描述

这些事件属于perf_event_open()接口中的PERF_TYPE_HARDWARE大类。

选项编码(event/umask)Intel SDM 解释补充说明
cycles0x3c, 0x00Counts core clock cycles whenever the logical processor is in C0 state (not halted). The frequency of this event varies with state transitions in the core.
instructions0xc0, 0x00Counts when the last uop of an instruction retires.
cache-references0x2e, 0x4fAccesses to the LLC, in which the data is present (hit) or not present (miss).
cache-misses0x2e, 0x41Accesses to the LLC in which the data is not present (miss).
branches0xc4, 0x00Counts when the last uop of a branch instruction retires.
branch-misses0xc5, 0x00Counts when the last uop of a branch instruction retires which corrected misprediction of the branch prediction hardware at execution time.
bus-cycles0x3c, 0x01Counts at a fixed frequency whenever the logical processor is in C0 state (not halted).Current implementations count at core crystal clock, TSC, or bus clock frequency.
stalled-cycles-frontend与微架构相关Increments each cycle the # of Uops issued by the RAT to RS. Set Cmask = 1, Inv = 1, Any= 1 to count stalled cycles of this core.
stalled-cycles-backend与微架构相关Counts total number of uops to be executed per- thread each cycle. Set Cmask = 1, INV =1 to count stall cycles.
ref-cycles0x00, 0x30This event counts the number of reference core cpu cycles. Reference means that the event increments at a constant rate which is not subject to core CPU frequency adjustments. The event may not count when the processor is in halted (low power) state. As such, it may not be equivalent to wall clock time. However, when the processor is not halted state, the event keeps a constant correlation with wall clock time.

cache硬件事件,最终会落实到cpu pmu来统计。

在这里插入图片描述

这些事件属于perf_event_open()接口中的PERF_TYPE_HW_CACHE事件大类。再配合cache_type, cache_op, cache_result属性来拼凑成不同统计值。

由于cache_type, cache_op, cache_result是跟微架构相关的,linux内核利用hw_cache_event_ids和hw_cache_extra_regs 数组来保存特定体系结构对应的值。内核根据CPU微架构区分赋值可以参考。对着intel SDM阅读后,感觉不能完全相信perf_event_open的man手册,统计项根据cpu微架构不同而有区别。

Skylake微架构:

选项编码(event/umask)Intel SDM 分类Intel SDM 解释补充说明
L1-dcache-loads0xd0,0x81MEM_INST_RETIRED.ALL_LOADSAll retired load instructions.
L1-dcache-loads-misses0x51,0x01L1D.REPLACEMENTCounts the number of lines brought into the L1 data cache.
L1-dcache-stores0xd0,0x82MEM_INST_RETIRED.ALL_STOR ESAll retired store instructions.
L1-icache-loads-misses0x83,0x02ICACHE_64B.IFTAG_MISSInstruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity.
LLC-loads0xb7,0x01
LLC-load-misses0xb7,0x01
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值