SIMD说明
Single Instruction Multiple Data
Intel Processor上对应的指令集include文件
// AVX immintrin.h
// MMX mmintrin.h
// SSE xmmintrin.h
// SSE2 emmintrin.h
// SSE3 pmmintrin.h
// suffix
// si256 – signed 256-bit integer
// si128 – signed 128-bit integer
// epi8, epi32, epi64 — an vector of signed 8-bit integers (32 in a __m256 and 16 in a __m128) or signed 32-bit integers or signed 64-bit integers
// epu8 — an vecotr of unsigned 8-bit integers (when there is a difference between what an operation would do with signed and unsigned numbers, such as with conversion to a larger integer or multiplication)
// epu16, epu32 — an array of unsigned 16-bit integers or 8 unsigned 32-bit integers (when the operation would be different than signed)
// ps — “packed single” — 8 single-precision floats
// pd — “packed double” — 4 doubles
// ss — one float (only 32-bits of a 256-bit or 128-bit value are used)
// sd — one double (only 64-bits of a 256-bit or 256-bit value are used)
以下例子中,输入的数据是否需要对齐取决于load指令的要求,在这里查找不同load指令的要求:
Intel Intrinsic指令含义查询链接
并行4个32 bit integer相加例子 – 输入无须对齐,__m128i赋值时间较长
#include<emmintrin.h>
#include<immintrin.h>
void print128_num_epi32(__m128i var)
{
uint32_t val[4];
memcpy(val, &var, sizeof(val))