在x86架构中,我们对Time Stamp Counter (TSC) 寄存器非常熟悉,通过这个寄存器对代码执行时间的衡量可精确到CPU Cycle级别。
但在ARM/ARMv8/aarch64架构中,并没有与x86 TSC对应的寄存器和直接对应的汇编指令rdtsc。
若想在ARMv8架构中,统计计算代码执行时间达到CPU Cycle级别,也需要读取类似x86的TSC寄存器。在ARMv8中,有Performance Monitors Control Register系列寄存器,其中PMCCNTR_EL0就类似于x86的TSC寄存器。本文介绍Linux下读取ARM TSC方法。
读取这个PMCCNTR_EL0寄存器值,就可以知道当前CPU已运行了多少Cycle。但在ARM下读取CPU Cycle和x86有所不同:
1、x86用户态代码可以随便读取TSC值。但在ARM,默认情况是用户态是不可以读的,需要在内核态使能后,用户态才能读取。
开关在由寄存器PMCR_EL0控制。实际上这个寄存器控制整个PMU寄存器在用户态是否可读写,不仅仅是PMCCNTR_EL0。
在内核态使能,可以是编写单独内核模块,也可以在内核代码任意被执行的位置加上设置使能PMU寄存器代码即可。Linux下使能(Enable)用户态访问PMU内核模块代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
2、x86下TSC的值,在CPU上电后就开始累加,且是只读寄存器。但在ARM中,只有使能PMCCNTR_EL0后,TSC才开始累加计数,且PMCCNTR_EL0寄存器可清零,相当于计时器。
用户态读取ARMv8 PMU寄存器代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
The following table shows the PMCR_EL0 bit assignments for a System register access.
Table 11-4 PMCR_EL0 bit assignments
Bits | Name | Function | ||||
---|---|---|---|---|---|---|
[31:24] | IMP | Implementer code:
This is a read-only field. | ||||
[23:16] | IDCODE | Identification code:
This is a read-only field. | ||||
[15:11] | N | Number of event counters. In Non-secure modes other than Hyp mode, this field reads the value of HDCR.HPMN. See 4.5.12 Hyp Debug Control Register. In Secure state and Hyp mode, this field returns This is a read-only field. | ||||
[10:7] | – | Reserved, RES0. | ||||
[6] | LC | Long cycle count enable. Selects which PMCCNTR_EL0 bit generates an overflow recorded in PMOVSR[31]:
| ||||
[5] | DP | Disable cycle counter, PMCCNTR_EL0 when event counting is prohibited:
This bit is read/write. | ||||
[4] | X | Export enable. This bit permits events to be exported to another debug device, such as a trace macrocell, over an event bus:
This bit is read/write and does not affect the generation of Performance Monitors interrupts, that can be implemented as a signal exported from the processor to an interrupt controller. | ||||
[3] | D | Clock divider:
This bit is read/write. | ||||
[2] | C | Clock counter reset:
NoteResetting PMCCNTR does not clear the PMCCNTR_EL0 overflow bit to 0. See the ARM® Architecture Reference Manual ARMv8 for more information. This bit is write-only, and always RAZ. | ||||
[1] | P | Event counter reset:
In Non-secure modes other than Hyp mode, a write of 1 to this bit does not reset event counters that the HDCR.HPMN field reserves for Hyp mode use. See 4.5.12 Hyp Debug Control Register. In Secure state and Hyp mode, a write of 1 to this bit resets all the event counters. | ||||
[0] | E | Enable bit. This bit does not disable or enable, counting by event counters reserved for Hyp mode by HDCR.HPMN. It also does not suppress the generation of performance monitor overflow interrupt requests by those counters:
This bit is read/write. |