arm linux fpu,多媒体处理，利用ARM NEON/FPU提升performance

最新推荐文章于 2024-03-21 15:37:50 发布

狠茬子嘻嘻

最新推荐文章于 2024-03-21 15:37:50 发布

阅读量352

点赞数

文章标签： arm linux fpu

在有些软件中需要大量的浮点运算，举个例子; 音频处理。

如果所用的CPU不带FPU，这些运算就要用软件实现，举个例子：

其中乘法操作，可能会用 __aeabi_dmul 来代替，

The ARM floating-point environment is an implementation of the IEEE 754-1985 standard for

binary floating-point arithmetic.

An ARM system might have:

? a VFP coprocessor

? no floating-point hardware.

If you compile for a system with a hardware VFP coprocessor, the ARM compiler makes use of

it. If you compile for a system without a coprocessor, the compiler implements the computations

in software. For example, the compiler option --fpu=vfp selects a hardware VFP coprocessor

and the option --fpu=softvfp specifies that arithmetic operations are to be performed in

software, without the use of any coprocessor instructions.

具体请参考：

__aeabi_dmul 2 double double Return x times y

这样会很慢。

如果硬件支持FPU，可以直接使用FPU来运算。

例如：上面的double 乘法操作，会直接使用：

vmul.f64 来完成，这样会很快。

我谢了一段code做了一个测试；

volatile double para_1 = 10.10;

volatile double para_2 = 10.10;

volatile double result;

int index;

for(index=0;index<0x1000000;index++)

{

result = para_1 * para_2;

}

同样的10M 次乘法操作，如果不使用FPU，消耗大约 1700ms 如果利用FPU，只需要350ms左右。

除法运算差异更大; 如果不适用FPU,需要6,700 ms ,使用FPU 只需要515ms

如果CPU有FPU，则尽可能把他们利用起来，可以大幅度提升performance。

另外ARM根据不同CPU给出了另外的优化建议；

例如，Cortex A9 可以参考：

Cortex?-A9 Floating-Point Unit

Revision: r4p1

Technical Reference Manual

==> 1.3 Writing optimal FP code

另外，如果用的是Cortex-A 的ARM，同时可以考虑利用NEON来优化。具体请参考： Introducing NEON Development Article

It extends the SIMD concept by defining groups of instructions operating on vectors stored in 64-bit D, doubleword, registers and 128-bit Q, quadword, vector registers.

NEON这个feature已经集成到了gcc，可以直接使用。

NEON intrinsics with GCC

To use NEON intrinsics in GCC, you must specify -mfpu=neon on the compiler

command line:

arm-none-linux-gnueabi-gcc -mfpu=neon intrinsic.c

Depending on your toolchain, you might also have to add -mfloat-abi=softfp to indicate

to the compiler that NEON variables must be passed in general purpose registers.

A complete list of supported intrinsics can be found at

===>另外注意NEON跟VFP不是绝对存在的，但是ARM建议是有VFP就有NEON (Cortex-A系列都实现了NEON,)

狠茬子嘻嘻

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。