NEON简介

最新推荐文章于 2024-09-15 11:56:33 发布

wxb_blog

最新推荐文章于 2024-09-15 11:56:33 发布

阅读量1w

点赞数 2

分类专栏： NEON 文章标签： NEON

本文链接：https://blog.csdn.net/wxb1553725576/article/details/81638716

版权

2 篇文章 0 订阅

订阅专栏

Arm NEON technology is an advanced SIMD(Single Instruction Multiple Data) architecture extension for the Arm Cortex-A series and Cortex-R52. processors.
NEON technology was introduced to the Armv7-A and Armv7-R profiles. It is also now an extension to the Armv8-A and Armv8-R profiles.
NEON technology is intended to improve the multimedia user experience by accelerating audio and video encoding/decoding, user interface, 2D/3D graphics or gaming. NEON can also accelerate signal processing algorithms and functions to speed up applications such as audio and video processing, voice and facial recognition, computer vision and deep learning. SIMD Architecture as Figure below:

The NEON technology is a packed SIMD architecture. NEON registers are considered as vectors of elements of the same data type. Multiple data types are supported by the technology. The following table describes data types as supported by the architecture version.

Types Armv7-A/R Armv8-A/R Armv8-A
Arch NULL AArch32 AArch64
float 32-bit 16/32-bit 16/32/64-bit
int 8/16/32-bit 8/16/32/64-bit 8/16/32/64-bit
The NEON instructions perform the same operations in all lanes of the vectors. The number of operations performed depends on the data types. NEON instructions allow up tp:
- 16x8-bit, 8x16-bit, 4x32-bit,, 2x64-bit integer operations
- 8x16-bit, 4x32-bit, 2x644-bit, floating-point operations
The implementation on NEON technology can also support issue of multiple instructions in parallel.
- Only in Armv8.2-A
- Only in Armv8-A/R

Types	Armv7-A/R	Armv8-A/R	Armv8-A
Arch	NULL	AArch32	AArch64
float	32-bit	16/32-bit	16/32/64-bit
int	8/16/32-bit	8/16/32/64-bit	8/16/32/64-bit

NEON can be used multiple ways, including NEON enabled libraries, compiler’s auto-vectorization feature, NEON intrinsics, and finally, NEON assembly code. Detailed information on NEON programming can be found in the NEON Programmer’s Guide Version:1.0.

The auto-vectorization feature is supported by Arm compilers wherein they exploit NEON functionality automatically.
This feature is supported by:

NEON intrinsics are function calls that the compiler replaces with an appropriate NEON instruction or sequence of NEON instructions. Intrinsics provide almost as much control as writing assembly language but leave the allocation of registers to the compiler, so that developers can focus on the algorithms. It can also perform instruction scheduling to remove pipeline stalls for the specified target processor. This leads to more maintainable source code than using assembly language. NEON intrinsics is supported by Arm Compilers, gcc and LLVM.

For very high performance, hand-coded NEON assembler is the best approach for experienced programmers. Both GNU assembler(gas) and Arm Compiler toolchain assembler(armasm) support assembly of NEON instructions.