neon浮点运算_ARM NEON指令集优化理论与实践

最新推荐文章于 2024-08-27 20:53:09 发布

weixin_39620065

最新推荐文章于 2024-08-27 20:53:09 发布

阅读量516

点赞数

文章标签： neon浮点运算

本文链接：https://blog.csdn.net/weixin_39620065/article/details/111839894

版权

本文介绍了ARM NEON指令集，一种基于SIMD技术的浮点运算加速技术，用于提升ARM处理器的性能。文章详细讲解了NEON寄存器、指令格式，并通过示例展示了如何进行浮点运算优化，包括降低数据依赖性、减少跳转和其他技巧，以提高程序效率。

摘要由CSDN通过智能技术生成

ARM NEON指令集优化理论与实践

一．简介

NEON就是一种基于SIMD思想的ARM技术，相比于ARMv6或之前的架构，NEON结合了64-bit和128-bit的SIMD指令集，提供128-bit宽的向量运算(vector operations)。NEON技术从ARMv7开始被采用，目前可以在ARM Cortex-A和Cortex-R系列处理器中采用。NEON在Cortex-A7、Cortex-A12、Cortex-A15处理器中被设置为默认选项，但是在其余的ARMv7 Cortex-A系列中是可选项。NEON与VFP共享了同样的寄存器，但它具有自己独立的执行流水线。

二. NEON寄存器

三. NEON指令集

所有的支持NEON指令都有一个助记符V，下面以32位指令为例，说明指令的一般格式：

V{}{}{}{.}{}, src1, src2Q: The instruction uses saturating arithmetic, so that the result is saturated within the range of the specified data type, such as VQABS, VQSHL etc.

H: The instruction will halve the result. It does this by shifting right by one place (effectively a divide by two with truncation), such as VHADD, VHSUB.

D: The instruction doubles the result, such as VQDMULL, VQDMLAL, VQDMLSL and VQ{R}DMULH.

R: The instruction will perform rounding on the result, equivalent to adding 0.5 to the result before truncating, such as VRHADD, VRSHR.

- the operation (for example, ADD, SUB, MUL).

- Shape，即前文中的Long (L), Wide (W), Narrow (N).

- Condition, used with IT instruction.

<.dt> - Data type, such as s8, u8, f32 etc.

- Destination.

- Source operand 1.

- Source operand 2.

注: {} 表示可选的参数。

比如：

VADD.I16 D0, D1, D2 @ 16位加法