Ref EE-141 Note
emuclk,cmuclk2时汇编指令中的时钟寄存器,主要功能是测试一段代码运行时间,然后对代码进行优化。
emuclk会随着每一条指令的运行而增长,不会受cache的丢失、延时等影响,当emuclk=0时,cmuclk2自动加一。
在使用它们的时候,主要是用下面的两个宏:
#define CYCLE_COUNT_START( cntr ) asm("r0 = emuclk; %0 = r0;": \
"=k" (cntr):"d" (cntr): \
"r0")
#define CYCLE_COUNT_STOP( cntr ) asm("r0 = emuclk; r1 = %1; r2 = 4; \
r0 = r0 - r2; r0 = r0 - r1; %0 = r0;" : \
"=k" (cntr) : \
"d" (cntr) : "r0", "r1")
下面是一个demo:
#include <stdio.h>
/* Cycle Count Example Code */
/* Infamous cycle count macros */
#define CYCLE_COUNT_START( cntr ) asm("r0 = emuclk; %0 = r0;": \
"=k" (cntr):"d" (cntr): \
"r0")
#define CYCLE_COUNT_STOP( cntr ) asm("r0 = emuclk; r1 = %1; r2 = 4; \
r0 = r0 - r2; r0 = r0 - r1; %0 = r0;" : \
"=k" (cntr) : \
"d" (cntr) : "r0", "r1")
// test vectors
float dm Vector_A[256];
float pm Vector_B[256];
float dm Vector_C[256];
int cnt0; // does not have to be global
main() {
int i;
// read contents of EMUCLK and store in cnt0
CYCLE_COUNT_START(cnt0);
// perform loop
for (i=0; i<256;i++) {
Vector_C[i] = Vector_A[i] * Vector_B[i];
}
// calculate total number of cycles and store result in cnt0
CYCLE_COUNT_STOP(cnt0);
// print the results of the benchmark
printf("The cycle count for vector multiplication execution was %d cycles.\n",cnt0);
printf("The cycle count for each vector element was %f.\n",((float)cnt0*(1.0/256.0)));
}