NEON简介

  • 这篇博客旨在介绍NEON的基础知识,同时会给出一个简单可用的example。

NEON

  • Arm NEON technology is an advanced SIMD(Single Instruction Multiple Data) architecture extension for the Arm Cortex-A series and Cortex-R52. processors.
  • NEON technology was introduced to the Armv7-A and Armv7-R profiles. It is also now an extension to the Armv8-A and Armv8-R profiles.
  • NEON technology is intended to improve the multimedia user experience by accelerating audio and video encoding/decoding, user interface, 2D/3D graphics or gaming. NEON can also accelerate signal processing algorithms and functions to speed up applications such as audio and video processing, voice and facial recognition, computer vision and deep learning. SIMD Architecture as Figure below:

image


Overview

  • The NEON technology is a packed SIMD architecture. NEON registers are considered as vectors of elements of the same data type. Multiple data types are supported by the technology. The following table describes data types as supported by the architecture version.

    TypesArmv7-A/RArmv8-A/RArmv8-A
    ArchNULLAArch32AArch64
    float32-bit16/32-bit16/32/64-bit
    int8/16/32-bit8/16/32/64-bit8/16/32/64-bit
  • The NEON instructions perform the same operations in all lanes of the vectors. The number of operations performed depends on the data types. NEON instructions allow up tp:

    • 16x8-bit, 8x16-bit, 4x32-bit,, 2x64-bit integer operations
    • 8x16-bit, 4x32-bit, 2x644-bit, floating-point operations
  • The implementation on NEON technology can also support issue of multiple instructions in parallel.
    • Only in Armv8.2-A
    • Only in Armv8-A/R

How to use NEON ?

  • NEON can be used multiple ways, including NEON enabled libraries, compiler’s auto-vectorization feature, NEON intrinsics, and finally, NEON assembly code. Detailed information on NEON programming can be found in the NEON Programmer’s Guide Version:1.0.

Libraries

Autovectorization

Compiler Intrinsics

  • NEON intrinsics are function calls that the compiler replaces with an appropriate NEON instruction or sequence of NEON instructions. Intrinsics provide almost as much control as writing assembly language but leave the allocation of registers to the compiler, so that developers can focus on the algorithms. It can also perform instruction scheduling to remove pipeline stalls for the specified target processor. This leads to more maintainable source code than using assembly language. NEON intrinsics is supported by Arm Compilers, gcc and LLVM.

Assembly code

  • For very high performance, hand-coded NEON assembler is the best approach for experienced programmers. Both GNU assembler(gas) and Arm Compiler toolchain assembler(armasm) support assembly of NEON instructions.

Example

  • 例子是一个向量加法,用到了Neon Intrinsics, 也就是上文中所说的Compiler Intrinsics,代码neon_vecadd.cpp, 编译命令 g++ neon_vecadd.cpp -mfpu=neon
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值