貌似这个函数很强大:
__m128i src = _mm_set_epi32( 0x3333, 0x2222, 0x1111, 0x0000);
__m128i dest = _mm_shuffle_epi32(aas, _MM_SHUFFLE(1, 0, 3, 2));
结果值:
dest = {0x00001111 00000000 00003333 00002222}
用途:不超过3个指令完成许多普通数据移位操作,例如广播,交换,反转等。
"No more than 3 instructions, using PSHUFLW/PSHUFHW/PSHUFD, are required to
implement many common data shuffling operations. Broadcast, Swap, and Reverse"
具体参考:
1. http://msdn.microsoft.com/en-us/library/4d3eabky(vs.71).aspx
2. http://www.intel.com/design/processor/manuals/248966.pdf