指令集avx512使用
前言
avx512指令编译遇到的问题
一、avx512指令编译条件
1、avx512指令需要使用g++编译
2、cpu需要支持avx512指令(linux系统:lscpu查看是否支持)
二、编译avx512动态库
1.g++编译
代码如下(示例):
#include<dlfcn.h>
#include <immintrin.h>
int main()
{
int16_t *pLS= NULL;
pLS=(int16_t*)_mm_malloc(61440*4,64);
__m512i xmmLS = _mm512_load_epi32(pLS);
__m512i mm=_mm512_set1_epi32(11);
mm=_mm512_add_epi32(mm,mm);
}
2.编译命令
代码如下(示例):
g++ b.c -o b.out -mavx512bw -mavx512vl -mavx512f -mavx512cd -mavx512dq -msse
三、编译avx512需要注意64字节地址对齐
1、查看intel定义
intel指令集代码如下:
typedef float __m512 __attribute__((__vector_size__(64), __aligned__(64)));
typedef double __m512d __attribute__((__vector_size__(64), __aligned__(64)));
typedef long long __m512i __attribute__((__vector_size__(64), __aligned__(64)));
如上所述,avx512是64字节地址对齐的,如果不按64字节对齐,会导致代码段错误
2、64字节对齐
申请地址时64字节对齐
pLS=(int16_t*)_mm_malloc(61440*4,64);
四、指令集参考地址
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#expand=388,0,3345,697,5024,697,3801,4953,3511,2228,3511,5773,4685,5773,3511,3500,3511,1428,2464,5319,5703,109,3500,703,6151,6149,4961,3500,5703,5351,3705,1481,633,1481,3177,5694,5445,430,1428,5351,637,633,1481,3177,5694,3177,5474,5703,697,3582,5694,3420,3420,3396,3420,5598&techs=AVX2