Efficient C Programming for ARM ,-- my digest

Efficient C Programming for ARM

读书笔记,进一步确认自己懂的太少!

 

Basic C Data Types

    局部变量用signed/unsigned int,对除法来说unsigned int 更快
    For local variables held in registers, don’t use a char or short type unless 8-bit or
    16-bit modular arithmetic is necessary. Use the signed or unsigned int types instead.
    Unsigned types are faster when you use divisions.



    存储器中保存的数组用最小的单位,可以节约储存空间
    For array entries and global variables held in main memory, use the type with the
    smallest size possible to hold the required data. This saves memory footprint.

    对ARMv4以上的架构,用*p这样的游标来访问数组
    The ARMv4 architecture is efficient at loading and storing all data widths provided you
    traverse arrays by incrementing the array pointer.

    避免用a[i],这种数组基址+offset的形似
    Avoid using offsets from the base of the array with short type arrays,
    as LDRH does not support this.



    Use explicit casts when reading array entries or global variables into local variables, or
    writing local variables out to array entries. The casts make it clear that for fast operation
    you are taking a narrow width type stored in memory and expanding it to a wider type
    in the registers. Switch on implicit narrowing cast warnings in the compiler to detect
    implicit casts.


    Avoid implicit or explicit narrowing casts in expressions because they usually cost extra
    cycles. Casts on loads or stores are usually free because the load or store instruction
    performs the cast for you.



    避免用char,short作为函数的参数传递,而是用int
    Avoid char and short types for function arguments or return values. Instead use the
    int type even if the range of the parameter is smaller. This prevents the compiler
    performing unnecessary casts.



C looping Structures

    使用循环递减到0的比较:for(i=100;i != 0;--){}
    Use loops that count down to zero. Then the compiler does not need to allocate
    a register to hold the termination value, and the comparison with zero is free.

    Use unsigned loop counters by default and the continuation condition i!=0 rather than
    i>0. This will ensure that the loop overhead is only two instructions.


    用do-while代替for
    Use do-while loops rather than for loops when you know the loop will iterate at least
    once. This saves the compiler checking to see if the loop count is zero.


    Unroll,展开循环体,但避免过度
    Unroll important loops to reduce the loop overhead. Do not overunroll. If the loop
    overhead is small as a proportion of the total, then unrolling will increase code size and
    hurt the performance of the cache.



    Try to arrange that the number of elements in arrays are multiples of four or eight. You
    can then unroll loops easily by two, four, or eight times without worrying about the
    leftover array elements.



Register Allocation

    限制局部变量<12个
    Try to limit the number of local variables in the internal loop of functions to 12. The
    compiler should be able to allocate these to ARM registers.


    通过把最重要的变量放到内部循环中来引导编译器
    You can guide the compiler as to which variables are important by ensuring these
    variables are used within the innermost loop.


Fuction Calls

    限制函数的参数个数<4,用struct来封装相关变量
    Try to restrict functions to four arguments. This will make them more efficient to
    call. Use structures to group related arguments and pass structure pointers instead of
    multiple arguments.


    在同一个文件中定义小的函数,并且确保在应用前定义
    Define small functions in the same source file and before the functions that call them.
    The compiler can then optimize the function call or inline the small function.


    用__inline关键字
    Critical functions can be inlined using the __inline keyword.


Pointer Alias

    用局部变量保存那些需要访问内存的表达式
    Do not rely on the compiler to eliminate common subexpressions involving memory
    accesses. Instead create new local variables to hold the expression. This ensures the
    expression is evaluated only once.

    避免使用局部变量的地址
    Avoid taking the address of local variables. The variable may be inefficient to access
    from then on.


Structure Arrangement

    杂乱的struct结构会增加size,保持小的变量在前,大的在后
    Lay structures out in order of increasing element size. Start the structure with the
    smallest elements and finish with the largest.

    避免十分大的结构,用小的来组合
    Avoid very large structures. Instead use a hierarchy of smaller structures.


    For portability, manually add padding (that would appear implicitly) into API
    structures so that the layout of the structure does not depend on the compiler.


    enum的大小是编译器相关的
    Beware of using enum types in API structures. The size of an enum type is compiler
    dependent.


Bit-fields

    尽量不用位域,用#define和mask
    Avoid using bit-fields. Instead use #define or enum to define mask values.

    用AND,OR,XOR和mask来操作
    Test, toggle, and set bit-fields using integer logical AND, OR, and exclusive OR oper-
    ations with the mask values. These operations compile efficiently, and you can test,
    toggle, or set multiple fields at the same time.


Unaligned Data and Endianness

    Avoid using unaligned data if you can.

    Use the type char * for data that can be at any byte alignment. Access the data by
    reading bytes and combining with logical operations. Then the code won’t depend on
    alignment or ARM endianness configuration.


    为了加快访问,可以根据架构不同,写不同的访问函数
    For fast access to unaligned structures, write different variants according to pointer
    alignment and processor endianness.


Division

    Avoid divisions as much as possible. Do not use them for circular buffer handling.


    If you can’t avoid a division, then try to take advantage of the fact that divide routines
    often generate the quotient n/d and modulus n%d together.


    To repeatedly divide by the same denominator d, calculate s = (2k − 1)/d in advance.
    You can replace the divide of a k-bit unsigned integer by d with a 2k-bit multiply by s.


    To divide unsigned n < 2N by an unsigned constant d, you can find a 32-bit unsigned s
    and shift k such that n/d is either (ns)
    (N + k) or (ns + s)
    (N + k). The choice
    depends only on d. There is a similar result for signed divisions.


Inline Functions and Inline Assembly

    Use inline functions to declare new operations or primitives not supported by the
    C compiler.

    Use inline assembly to access ARM instructions not supported by the C compiler.
    Examples are coprocessor instructions or ARMv5E extensions.


Portability Issues

    The char type. On the ARM, char is unsigned rather than signed as for many other
    processors. A common problem concerns loops that use a char loop counter i and
    the continuation condition i ≥ 0, they become infinite loops. In this situation, armcc
    produces a warning of unsigned comparison with zero. You should either use a compiler
    option to make char signed or change loop counters to type int.



    The int type. Some older architectures use a 16-bit int, which may cause problems
    when moving to ARM’s 32-bit int type although this is rare nowadays. Note that
    expressions are promoted to an int type before evaluation. Therefore if i = -0x1000,
    the expression i == 0xF000 is true on a 16-bit machine but false on a 32- bit machine.


    Unaligned data pointers. Some processors support the loading of short and int typed
    values from unaligned addresses. A C program may manipulate pointers directly so
    that they become unaligned, for example, by casting a char * to an int *. ARM
    architectures up to ARMv5TE do not support unaligned pointers. To detect them,
    run the program on an ARM with an alignment checking trap. For example, you can
    configure the ARM720T to data abort on an unaligned access.


    Endian assumptions. C code may make assumptions about the endianness of a memory
    system, for example, by casting a char * to an int *. If you configure the ARM for
    the same endianness the code is expecting, then there is no issue. Otherwise, you must
    remove endian-dependent code sequences and replace them by endian-independent
    ones. See Section 5.9 for more details


    Function prototyping. The armcc compiler passes arguments narrow, that is, reduced
    to the range of the argument type. If functions are not prototyped correctly, then the
    function may return the wrong answer. Other compilers that pass arguments wide may
    give the correct answer even if the function prototype is incorrect. Always use ANSI
    prototypes.


    Use of bit-fields. The layout of bits within a bit-field is implementation and endian
    dependent. If C code assumes that bits are laid out in a certain order, then the code is
    not portable.


    Use of enumerations. Although enum is portable, different compilers allocate different
    numbers of bytes to an enum. The gcc compiler will always allocate four bytes to an enum
    type. The armcc compiler will only allocate one byte if the enum takes only eight-bit
    values. Therefore you can’t cross-link code and libraries between different compilers if
    you use enums in an API structure.


    Inline assembly. Using inline assembly in C code reduces portability between
    architectures. You should separate any inline assembly into small inlined functions
    that can easily be replaced. It is also useful to supply reference, plain C implementations
    of these functions that can be used on other architectures, where this is possible.


    The volatile keyword. Use the volatile keyword on the type definitions of ARM
    memory-mapped peripheral locations. This keyword prevents the compiler from opti-
    mizing away the memory access. It also ensures that the compiler generates a data access
    of the correct type. For example, if you define a memory location as a volatile short
    type, then the compiler will access it using 16-bit load and store instructions LDRSH
    and STRH.


Summary
Use the signed and unsigned int types for local variables, function arguments, and
return values. This avoids casts and uses the ARM’s native 32-bit data processing
 instructions efficiently.

The most efficient form of loop is a do-while loop that counts down to zero.

Unroll important loops to reduce the loop overhead.

Do not rely on the compiler to optimize away repeated memory accesses. Pointer
   aliasing often prevents this.

Try to limit functions to four arguments. Functions are faster to call if their arguments
   are held in registers.

Lay structures out in increasing order of element size, especially when compiling for
   Thumb.

Don’t use bit-fields. Use masks and logical operations instead.

Avoid divisions. Use multiplications by reciprocals instead.

Avoid unaligned data. Use the char * pointer type if the data could be unaligned.

Use the inline assembler in the C compiler to access instructions or optimizations that
   the C compiler does not support.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值