NEON简介

  • 这篇博客旨在介绍NEON的基础知识,同时会给出一个简单可用的example。

NEON

  • Arm NEON technology is an advanced SIMD(Single Instruction Multiple Data) architecture extension for the Arm Cortex-A series and Cortex-R52. processors.
  • NEON technology was introduced to the Armv7-A and Armv7-R profiles. It is also now an extension to the Armv8-A and Armv8-R profiles.
  • NEON technology is intended to improve the multimedia user experience by accelerating audio and video encoding/decoding, user interface, 2D/3D graphics or gaming. NEON can also accelerate signal processing algorithms and functions to speed up applications such as audio and video processing, voice and facial recognition, computer vision and deep learning. SIMD Architecture as Figure below:

image


Overview

  • The NEON technology is a packed SIMD architecture. NEON registers are considered as vectors of elements of the same data type. Multiple data types are supported by the technology. The following table describes data types as supported by the architecture version.

    TypesArmv7-A/RArmv8-A/RArmv8-A
    ArchNULLAArch32AArch64
    float32-bit16/32-bit16/32/64-bit
    int8/16/32-bit8/16/32/64-bit8/16/32/64-bit
  • The NEON instructions perform the same operations in all lanes of the vectors. The number of operations performed depends on the data types. NEON instructions allow up tp:

    • 16x8-bit, 8x16-bit, 4x32-bit,, 2x64-bit integer operations
    • 8x16-bit, 4x32-bit, 2x644-bit, floating-point operations
  • The implementation on NEON technology can also support issue of multiple instructions in parallel.
    • Only in Armv8.2-A
    • Only in Armv8-A/R

How to use NEON ?

  • NEON can be used multiple ways, including NEON enabled libraries, compiler’s auto-vectorization feature, NEON intrinsics, and finally, NEON assembly code. Detailed information on NEON programming can be found in the NEON Programmer’s Guide Version:1.0.

Libraries

Autovectorization

Compiler Intrinsics

  • NEON intrinsics are function calls that the compiler replaces with an appropriate NEON instruction or sequence of NEON instructions. Intrinsics provide almost as much control as writing assembly language but leave the allocation of registers to the compiler, so that developers can focus on the algorithms. It can also perform instruction scheduling to remove pipeline stalls for the specified target processor. This leads to more maintainable source code than using assembly language. NEON intrinsics is supported by Arm Compilers, gcc and LLVM.

Assembly code

  • For very high performance, hand-coded NEON assembler is the best approach for experienced programmers. Both GNU assembler(gas) and Arm Compiler toolchain assembler(armasm) support assembly of NEON instructions.

Example

  • 例子是一个向量加法,用到了Neon Intrinsics, 也就是上文中所说的Compiler Intrinsics,代码neon_vecadd.cpp, 编译命令 g++ neon_vecadd.cpp -mfpu=neon
  • 2
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
NEON是ARM架构中的一种SIMD(单指令多数据)技术,它可以在同一时钟周期内对多个数据进行并行处理,从而提高数据处理的效率。在NEON中,可以使用NEON寄存器来加速拷贝操作。 对于NEON拷贝,有两个引用内容提供了相关的函数实现。其中,引用给出了一个用NEON寄存器进行加速拷贝的函数memcpy_neon_64,它一次可以拷贝64字节,并适用于64字节的对齐拷贝。函数实现的伪代码如下: ```assembly void* memcpy_neon_64(void* dest, const void* src, size_t size) { mov r3, r0 // 保存返回值 0: PLD(pld [r1, #256]) // 预取数据 subs r2, r2, #64 // 计算剩余拷贝大小 vldmia.64 r1!, {d0-d7} // 从源地址加载数据到寄存器 vstmia.64 r0!, {d0-d7} // 将寄存器中的数据存储到目标地址 bgt 0b // 如果还有剩余数据,则继续循环 mov r0, r3 // 将返回值保存到r0寄存器中 mov pc, lr // 退出函数 } ``` 另外,引用提供了另一种实现方式,函数memcpy_1一次只拷贝一个字节,可用于对齐拷贝和非对齐拷贝。函数实现的伪代码如下: ```c void *memcpy_1(void *dest, const void *src, size_t count) { char *tmp = dest; const char *s = src; while (count--) *tmp++ = *s++; return dest; } ``` 如果你想要了解更多关于NEON指令的使用方法,可以参考GCC官方文档中的ARM NEON Intrinsics页面[3]。 综上所述,NEON memcpy是利用NEON寄存器来加速拷贝操作的一种技术,可以一次拷贝多个字节,提高数据处理效率。可以根据需求选择适合的NEON拷贝函数进行使用。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值