通过修改arm_neon.h文件中的vmlaq_lane_f32宏定义来解决问题。
在aarch64平台里用gcc编译libopus时,出现错误,错误表现如下:
In file included from celt/arm/celt_neon_intr.c:37:0:
celt/arm/celt_neon_intr.c: In function ‘xcorr_kernel_neon_float’:
celt/arm/celt_neon_intr.c:137:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YY[0], vget_low_f32(XX[0]), 0);
^
celt/arm/celt_neon_intr.c:139:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[0], vget_low_f32(XX[0]), 1);
^
celt/arm/celt_neon_intr.c:141:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[1], vget_high_f32(XX[0]), 0);
^
celt/arm/celt_neon_intr.c:143:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[2], vget_high_f32(XX[0]), 1);
^
celt/arm/celt_neon_intr.c:145:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YY[1], vget_low_f32(XX[1]), 0);
^
celt/arm/celt_neon_intr.c:147:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[0], vget_low_f32(XX[1]), 1);
^
celt/arm/celt_neon_intr.c:149:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[1], vget_high_f32(XX[1]), 0);
^
celt/arm/celt_neon_intr.c:151:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[2], vget_high_f32(XX[1]), 1);
^
celt/arm/celt_neon_intr.c:170:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YY[0], vget_low_f32(XX[0]), 0);
^
celt/arm/celt_neon_intr.c:172:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[0], vget_low_f32(XX[0]), 1);
^
celt/arm/celt_neon_intr.c:174:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[1], vget_high_f32(XX[0]), 0);
^
celt/arm/celt_neon_intr.c:176:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YEXT[2], vget_high_f32(XX[0]), 1);
^
celt/arm/celt_neon_intr.c:184:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
^
celt/arm/celt_neon_intr.c:189:11: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
^
make[2]: *** [celt/arm/celt_neon_intr.lo] Error 1
反复出现如下错误:
celt/arm/celt_neon_intr.c: In function ‘xcorr_kernel_neon_float’:
celt/arm/celt_neon_intr.c:137:14: error: incompatible types when initializing type ‘float32x4_t’ using type ‘float32x2_t’
SUMM = vmlaq_lane_f32(SUMM, YY[0], vget_low_f32(XX[0]), 0);
这个是GCC的系统的错误,是因为 arm_neon.h 头里对vmlaq_lane_f32 这个宏的错误定义所致,当前系统的GCC版本是:
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC),
该版本所带的头文件 /usr/lib/gcc/aarch64-redhat-linux/4.8.2/include/arm_neon.h 里定义 vmlaq_lane_f32 有错误,因此可以通过修改该文件的vmlaq_lane_f32定义来解决问题,反正/usr/lib/gcc/aarch64-redhat-linux/4.8.2/include/arm_neon.h 本身对vmlaq_lane_f32 的定义有错误,所以修正此处也不会给系统带来影响,笔者修正后vmlaq_lane_f32 的定义如下:
#define vmlaq_lane_f32(a, b, c, d) \
__extension__ \
({ \
float32x2_t c_ = (c); \
float32x4_t b_ = (b); \
float32x4_t a_ = (a); \
float32x4_t result; \
float32x4_t t1; \
__asm__ ("fmul %1.4s, %3.4s, %4.s[%5]; fadd %0.4s, %0.4s, %1.4s" \
: "=w"(result), "=w"(t1) \
: "0"(a_), "w"(b_), "w"(c_), "i"(d) \
: /* No clobbers */); \
result; \
})
原始行的代码 float32x4_t c_ = (c); 被修正为 float32x2_t c_ = (c);
因为 vmlaq_lane_f32 在 GCC 4.8.5 里只是头文件里定义的宏,不是编译后的库文件中符号,因此可以放心大胆的修改该头文件来解决这个问题
还有一处错误,对 vmlaq_lane_s32 的定义也是有问题的,修改如下:
#define vmlaq_lane_s32(a, b, c, d) \
__extension__ \
({ \
int32x2_t c_ = (c); \
int32x4_t b_ = (b); \
int32x4_t a_ = (a); \
int32x4_t result; \
__asm__ ("mla %0.4s, %2.4s, %3.s[%4]" \
: "=w"(result) \
: "0"(a_), "w"(b_), "w"(c_), "i"(d) \
: /* No clobbers */); \
result; \
})
当然如果你比较悠闲,有足够的时间或者对修改系统头文件有担惊受怕的心理,你可以通过编译升级GCC到7.3来解决编译问题,GCC 7.3 对 已经将 vmlaq_lane_f32 改成内联函数了。