要么你用石膏
-flax-vector-conversions
或者使用联合类型来表示向量寄存器并显式处理该联合类型。GCC明确支持这种类型的punning。
例如,您可以声明
msa128
类型,
typedef union __attribute__ ((aligned (16))) {
v2u64 u64;
v2i64 i64;
v2f64 f64;
v4u32 u32;
v4i32 i32;
v4f32 f32;
v8u16 u16;
v8i16 i16;
v16u8 u8;
v16i8 i8;
} msa128;
然后让您的代码在
MSA128
类型。您的示例程序可以编写为
uint32_t a[4] = { 64, 128, 256, 512 };
uint32_t b[4] = { 1024, 2048, 4096, 8192 };
uint32_t c[4];
msa128 va, vb, vc;
va.i32 = __builtin_msa_ld_w(a, 0);
vb.i32 = __builtin_msa_ld_w(b, 0);
vc.u32 = __builtin_msa_adds_u_w(va.u32, vb.u32);
__builtin_msa_st_w(vc.i32, c, 0);
显然,记住需要使用的确切类型会变得非常烦人,因此一些静态内联帮助器函数肯定会很方便:
static inline msa128 msa128_load64(const void *from, const int imm)
{ return (msa128){ .i64 = __builtin_msa_ld_d(from, imm); } }
static inline msa128 msa128_load32(const void *from, const int imm)
{ return (msa128){ .i32 = __builtin_msa_ld_w(from, imm); } }
static inline msa128 msa128_load16(const void *from, const int imm)
{ return (msa128){ .i16 = __builtin_msa_ld_h(from, imm); } }
static inline msa128 msa128_load8(const void *from, const int imm)
{ return (msa128){ .i8 = __builtin_msa_ld_b(from, imm); } }
static inline void msa128_store64(const msa128 val, void *to, const int imm)
{ __builtin_msa_st_d(val.i64, to, imm); }
static inline void msa128_store32(const msa128 val, void *to, const int imm)
{ __builtin_msa_st_w(val.i32, to, imm); }
static inline void msa128_store16(const msa128 val, void *to, const int imm)
{ __builtin_msa_st_h(val.i16, to, imm); }
static inline void msa128_store8(const msa128 val, void *to, const int imm)
{ __builtin_msa_st_b(val.i8, to, imm); }
例如,binary and、or、nor和xor操作是
static inline msa128 msa128_and(const msa128 a, const msa128 b)
{ return (msa128){ .u8 = __builtin_msa_and_v(a, b) }; }
static inline msa128 msa128_or(const msa128 a, const msa128 b)
{ return (msa128){ .u8 = __builtin_msa_or_v(a, b) }; }
static inline msa128 msa128_nor(const msa128 a, const msa128 b)
{ return (msa128){ .u8 = __builtin_msa_nor_v(a, b) }; }
static inline msa128 msa128_xor(const msa128 a, const msa128 b)
{ return (msa128){ .u8 = __builtin_msa_xor_v(a, b) }; }
创建一些宏来表示数组形式的向量可能不会有什么影响:
#define MSA128_U64(...) ((msa128){ .u64 = { __VA_ARGS__ }})
#define MSA128_I64(...) ((msa128){ .i64 = { __VA_ARGS__ }})
#define MSA128_F64(...) ((msa128){ .f64 = { __VA_ARGS__ }})
#define MSA128_U32(...) ((msa128){ .u32 = { __VA_ARGS__ }})
#define MSA128_I32(...) ((msa128){ .i32 = { __VA_ARGS__ }})
#define MSA128_F32(...) ((msa128){ .f32 = { __VA_ARGS__ }})
#define MSA128_U16(...) ((msa128){ .u16 = { __VA_ARGS__ }})
#define MSA128_I16(...) ((msa128){ .i16 = { __VA_ARGS__ }})
#define MSA128_U8(...) ((msa128){ .u8 = { __VA_ARGS__ }})
#define MSA128_I8(...) ((msa128){ .i8 = { __VA_ARGS__ }})
我建议使用这种特定于GCC的方法的原因是,不管怎样,内置组件都是特定于GCC的。除了union类型外,它非常接近gcc在
.