Xtensa 仿真环境tunning分析(ISS profile)

linux gprof tool

https://www.ibm.com/developerworks/cn/linux/l-gnuprof.html

Tenslica

Profiling with the Xtensa ISS has several advantages over hardware profiling:

  • You do not need to compile the Xtensa program with special options (e.g., ‘-hwpg’)
    before profiling it.
  • There is no instrumentation code added to the Xtensa program, so the profile results
    are not distorted by any extra code.
  • The Xtensa ISS can easily record the execution of every instruction, so there is no need
    to rely on statistical approximations like PC-sampling.
  • Instead of counting execution cycles, the Xtensa ISS can optionally record profile data
    for other events, such as cache misses. You can then use xt-gprof or Xplorer to view
    a profile of these other events.

benchmark

  • pipeline interlock

    However, consider the following instructions:
    LD adr -> r10
    AND r10,r3 -> r11
    The data read from the address adr is not present in the data cache until after the Memory Access stage of the LD instruction. By this time, the AND instruction is already through the ALU. To resolve this would require the data from memory to be passed backwards in time to the input to the ALU. This is not possible. The solution is to delay the AND instruction by one cycle. The data hazard is detected in the decode stage, and the fetch and decode stages are stalled - they are prevented from flopping their inputs and so stay in the same state for a cycle. The execute, access, and write-back stages downstream see an extra no-operation instruction (NOP) inserted between the LD and AND instructions.
    This NOP is termed a pipeline bubble since it floats in the pipeline, like an air bubble, occupying resources but not producing useful results. The hardware to detect a data hazard and stall the pipeline until the hazard is cleared is called a pipeline interlock.

  • branch delay

performance tuning

  • data alignment for vectorization
void sum(int *a, int *b, int *c, int n)
{
#pragma aligned (a, 8)
#pragma aligned (b, 8)
#pragma aligned (c, 8)
    int i;
    for (i=0; i<n; i++) {
        a[i] = b[i] + c[i];
} }
  • Controlling Vectorization Through Pragmas
    each iteration of the loop is independent of all other iterations. This pragma will often make a loop vectorizable.
void copy (int *a, int *b, int n)
{
    int i;
#pragma concurrent
    for (i = 0; i < n; i++)
        a[i] = b[i];
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值