ARM NEON指令集优化理论与实践
一.简介
NEON就是一种基于SIMD思想的ARM技术,相比于ARMv6或之前的架构,NEON结合了64-bit和128-bit的SIMD指令集,提供128-bit宽的向量运算(vector operations)。NEON技术从ARMv7开始被采用,目前可以在ARM Cortex-A和Cortex-R系列处理器中采用。NEON在Cortex-A7、Cortex-A12、Cortex-A15处理器中被设置为默认选项,但是在其余的ARMv7 Cortex-A系列中是可选项。NEON与VFP共享了同样的寄存器,但它具有自己独立的执行流水线。
二. NEON寄存器
三. NEON指令集
所有的支持NEON指令都有一个助记符V,下面以32位指令为例,说明指令的一般格式:
V{}{}{}{.}{}, src1, src2Q: The instruction uses saturating arithmetic, so that the result is saturated within the range of the specified data type, such as VQABS, VQSHL etc.
H: The instruction will halve the result. It does this by shifting right by one place (effectively a divide by two with truncation), such as VHADD, VHSUB.
D: The instruction doubles the result, such as VQDMULL, VQDMLAL, VQDMLSL and VQ{R}DMULH.
R: The instruction will perform rounding on the result, equivalent to adding 0.5 to the result before truncating, such as VRHADD, VRSHR.
- the operation (for example, ADD, SUB, MUL).
- Shape,即前文中的Long (L), Wide (W), Narrow (N).
- Condition, used with IT instruction.
<.dt> - Data type, such as s8, u8, f32 etc.
- Destination.
- Source operand 1.
- Source operand 2.
注: {} 表示可选的参数。
比如:
VADD.I16 D0, D1, D2 @ 16位加法
VMLAL.S16 Q2,