AVX指令集的简单操作
使用AVX指令集进行2个double型的数组相加操作
我的博客地址
我的博客地址
https://amicoyuan.github.io/
使用到的AVX函数介绍
1.
__m256 _mm256_loadu_ps (float const * mem_addr)
Description
Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
Operation
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
2.
__m256d _mm256_add_pd (__m256d a, __m256d b)
Description
Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
Operation
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
dst[MAX:256] := 0
3.
void _mm256_storeu_pd (double * mem_addr, __m256d a)
Description
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
Operation
MEM[mem_addr+255:mem_addr] := a[255:0]
未进行AVX向量化的情况
程序源代码
#include<stdio.h>
int main()
{
double a[9] = {1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,2.1};
double b[9] = {2.1,3.2,6.4,8.6,3.7,9.9,5.1,4.2,6.6};
double c[9] = {0};
for(int i=0 ;i<9;i++)
{
c[i]=a[i]+b[i];
}
printf("this is c.\n");
for(int i=0;i<9;i++)
{
printf("%lf\n",c[i]);
}
return 0;
}
程序输出
this is c.
3.200000
5.400000
9.700000
13.000000
9.200000
16.500000
12.800000
13.000000
8.700000
进行AVX向量化的情况
程序源代码
#include<stdio.h>
#include <immintrin.h>
int main()
{
double a[9] = {1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,2.1};
double b[9] = {2.1,3.2,6.4,8.6,3.7,9.9,5.1,4.2,6.6};
double c[9] = {0};
__m256d v0;
__m256d v1;
__m256d v2;
int i=0;
for(;i<9-4;i+=4)
{
v0 = _mm256_loadu_pd(a+i);
v1 = _mm256_loadu_pd(b+i);
v2=_mm256_add_pd(v0,v1);
_mm256_storeu_pd(c+i,v2);
}
for(;i<9;i++)
{
c[i]=a[i]+b[i];
}
printf("this is c with AVX.\n");
for(int i=0;i<9;i++)
{
printf("%lf\n",c[i]);
}
return 0;
}
程序输出
this is c with AVX.
3.200000
5.400000
9.700000
13.000000
9.200000
16.500000
12.800000
13.000000
8.700000
相关链接
[https://software.intel.com/sites/landingpage/IntrinsicsGuide/]: " Intel® Intrinsics Guide"