- 这篇博客旨在介绍NEON的基础知识,同时会给出一个简单可用的example。
NEON
- Arm NEON technology is an advanced SIMD(Single Instruction Multiple Data) architecture extension for the Arm Cortex-A series and Cortex-R52. processors.
- NEON technology was introduced to the Armv7-A and Armv7-R profiles. It is also now an extension to the Armv8-A and Armv8-R profiles.
- NEON technology is intended to improve the multimedia user experience by accelerating audio and video encoding/decoding, user interface, 2D/3D graphics or gaming. NEON can also accelerate signal processing algorithms and functions to speed up applications such as audio and video processing, voice and facial recognition, computer vision and deep learning. SIMD Architecture as Figure below:
Overview
The NEON technology is a packed SIMD architecture. NEON registers are considered as vectors of elements of the same data type. Multiple data types are supported by the technology. The following table describes data types as supported by the architecture version.
Types Armv7-A/R Armv8-A/R Armv8-A Arch NULL AArch32 AArch64 float 32-bit 16/32-bit 16/32/64-bit int 8/16/32-bit 8/16/32/64-bit 8/16/32/64-bit The NEON instructions perform the same operations in all lanes of the vectors. The number of operations performed depends on the data types. NEON instructions allow up tp:
- 16x8-bit, 8x16-bit, 4x32-bit,, 2x64-bit integer operations
- 8x16-bit, 4x32-bit, 2x644-bit, floating-point operations
- The implementation on NEON technology can also support issue of multiple instructions in parallel.
- Only in Armv8.2-A
- Only in Armv8-A/R
How to use NEON ?
- NEON can be used multiple ways, including NEON enabled libraries, compiler’s auto-vectorization feature, NEON intrinsics, and finally, NEON assembly code. Detailed information on NEON programming can be found in the NEON Programmer’s Guide Version:1.0.
Libraries
Autovectorization
- The auto-vectorization feature is supported by Arm compilers wherein they exploit NEON functionality automatically.
- This feature is supported by:
Compiler Intrinsics
- NEON intrinsics are function calls that the compiler replaces with an appropriate NEON instruction or sequence of NEON instructions. Intrinsics provide almost as much control as writing assembly language but leave the allocation of registers to the compiler, so that developers can focus on the algorithms. It can also perform instruction scheduling to remove pipeline stalls for the specified target processor. This leads to more maintainable source code than using assembly language. NEON intrinsics is supported by Arm Compilers, gcc and LLVM.
Assembly code
- For very high performance, hand-coded NEON assembler is the best approach for experienced programmers. Both GNU assembler(gas) and Arm Compiler toolchain assembler(armasm) support assembly of NEON instructions.
Example
- 例子是一个向量加法,用到了Neon Intrinsics, 也就是上文中所说的Compiler Intrinsics,代码neon_vecadd.cpp, 编译命令 g++ neon_vecadd.cpp -mfpu=neon