ARM中的浮点运算

最新推荐文章于 2024-01-31 16:02:21 发布

我把葡萄酿成酒

最新推荐文章于 2024-01-31 16:02:21 发布

阅读量3.7k

点赞数

分类专栏：性能优化

本文链接：https://blog.csdn.net/ffmpeg4976/article/details/52571560

版权

性能优化专栏收录该内容

3 篇文章 1 订阅

订阅专栏

General

嵌入式系统中三种可能的浮点数处理方式（转载自StackOverflow）

1. Use float instructions if your CPU has a FPU. (fast) 直接使用浮点指令，前提是CPU有一个浮点运算单元。速度最快。
2. Have your compiler translate floating point arithmetic to integer arithmetic. (slow) 编译器把浮点数转换成整数。速度次快。
3. Use float instructions and a CPU with no FPU. Your CPU will generate a exception (Reserved Instruction, Unimplemented Instruction or similar), and if your OS kernel includes a floating point emulator it will emulate those instructions (slowest). 使用浮点指令，CPU没有FPU，此时CPU会产生一个异常（保留指令，未实现的指令等），若操作系统内核包含一个浮点模拟算法，它会在异常处理里面模拟浮点运算。速度最慢。

ARM

ARM Floating Point architecture (VFP) provides hardware support for floating point operations in half-, single- and double-precision floating point arithmetic. It is fully IEEE 754 compliant with full software library support.

The floating point capabilities of the ARM VFP offer increased performance for floating point arithmetic used in automotive powertrain and body control applications, imaging applications such as scaling, transforms and font generation in printing, 3D transforms, FFT and filtering in graphics. The next generation of consumer products such as Internet appliances, set-top boxes, and home gateways, can directly benefit from the ARM VFP.
VFP Applications

Automotive control applications
    Powertrain
    ABS, Traction control & active suspension
3D Graphics
    Digital consumer products
    Set-top boxes, games consoles
Imaging
    Laser printers, still digital cameras, digital video cameras
Industrial control systems
    Motion controls

Many real-time control applications in the industrial and automotive fields benefit from the dynamic range and precision of floating-point offered by the ARM VFP. Automotive powertrain, anti-lock braking, traction control, and active suspension systems are all mission-critical applications where precision and predictability are essential requirements.
VFP architecture versions

Before the ARMv7 architecture, VFP stood for Vector Floating-point Architecture, used for vector operations.

Provision of hardware floating point is essential for many applications, and can be used as part of a System on Chip (SoC) design flow using high-level design tools (eg MatLab, MATRIXx and LabVIEW) to directly model the system and derive the application code. Using hardware floating point combined with the NEON™ multimedia processing capability, performance of imaging applications such as scaling, 2D and 3D transforms, font generation, and digital filters can be increased.

There have been three main versions of VFP to date:

VFPv1 is obsolete. Details are available on request from ARM.
VFPv2 is an optional extension to the ARM instruction set in the ARMv5TE, ARMv5TEJ and ARMv6 architectures.
VFPv3 is an optional extension to the ARM, Thumb® and ThumbEE instruction sets in the ARMv7-A and ARMv7-R profiles. VFPv3 implementation is with either thirty-two or sixteen double word registers. The terms VFPv3-D32 and VFPv3-D16 distinguish between these two implementation options. Extending VFPv3 uses the half-precision extensions that provide conversion functions in both directions between half-precision floating-point and single-precision floating-point.

NEON
NEON Image
The ARM® NEON™ general-purpose SIMD engine efficiently processes current and future multimedia formats, enhancing the user experience.

NEON technology can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD.

Cleanly architected NEON technology works seamlessly with its own independent pipeline and register file.

NEON technology is a 128-bit SIMD (Single Instruction, Multiple Data) architecture extension for the ARM Cortex™-A series processors, designed to provide flexible and powerful acceleration for consumer multimedia applications, delivering a significantly enhanced user experience. It has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide.

NEON instructions perform “Packed SIMD” processing:

Registers are considered as vectors of elements of the same data type
Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision floating point
Instructions perform the same operation in all lanes

编译代码，使得它运行在软浮点和硬浮点（FPU）上
-mhard-float指定使用硬件，-msoft-float指定使用软件模拟算法，速度差异巨大。
If you are using ndk-build, you need to set the ABI to hard floating points in your Application.mk:

APP_ABI := armeabi-v7a-hard

This will set the appropriate flags for you. If you are not using ndk-build, you will want to add the following flags:
CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
LDFLAGS += -Wl,–no-warn-mismatch -lm_hard

我把葡萄酿成酒

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
ARM中的浮点运算

General嵌入式系统中三种可能的浮点数处理方式（转载自StackOverflow）1. Use float instructions if your CPU has a FPU. (fast) 直接使用浮点指令，前提是CPU有一个浮点运算单元。速度最快。2. Have your compiler translate floating point arithmetic to integer ar
复制链接

扫一扫