汇编 intel SSSE3 指令简介

最新推荐文章于 2024-08-26 23:52:00 发布

程序猴--小川

最新推荐文章于 2024-08-26 23:52:00 发布

阅读量1.3k

点赞数 10

文章标签：汇编

本文链接：https://blog.csdn.net/m0_73378754/article/details/138693346

版权

简介: 它是SSE3（Streaming SIMD Extensions 3）的补充，引入了一些额外的指令，用于加速多媒体和向量化计算。SSSE3指令集提供了一些新的指令，例如平行比较、平行加减运算、逐位平行操作等，可以帮助优化处理多媒体数据和执行向量化计算任务。这些指令能够提高程序的性能和效率，特别是在涉及大量数据并行处理的情况下。

水平加法/减法
PHADDW	PHADDW mm1, mm2/m64 Add 16-bit integers horizontally, pack to mm1. 将两个操作数相加，并将结果存储在mm1寄存器中。这里的mm1和mm2/m64表示寄存器或内存位置，分别存储着16位的数据。 PHADDW xmm1, xmm2/m128 Add 16-bit integers horizontally, pack to xmm1. 将两个操作数相加，并将结果存储在xmm1寄存器中。这里的xmm1和xmm2/m64表示寄存器或内存位置，分别存储着16位的数据。 PHADDW执行的是带符号的16位整数加法操作。 PHADDW (With 64-bit Operands) mm1[15-0] = mm1[31-16] + mm1[15-0]; mm1[31-16] = mm1[63-48] + mm1[47-32]; mm1[47-32] = mm2/m64[31-16] + mm2/m64[15-0]; mm1[63-48] = mm2/m64[63-48] + mm2/m64[47-32]; PHADDW (With 128-bit Operands) xmm1[15-0] = xmm1[31-16] + xmm1[15-0]; xmm1[31-16] = xmm1[63-48] + xmm1[47-32]; xmm1[47-32] = xmm1[95-80] + xmm1[79-64]; xmm1[63-48] = xmm1[127-112] + xmm1[111-96]; xmm1[79-64] = xmm2/m128[31-16] + xmm2/m128[15-0]; xmm1[95-80] = xmm2/m128[63-48] + xmm2/m128[47-32]; xmm1[111-96] = xmm2/m128[95-80] + xmm2/m128[79-64]; xmm1[127-112] = xmm2/m128[127-112] + xmm2/m128[111-96];
PHADDSW	PHADDSW mm1, mm2/m64 Add 16-bit signed integers horizontally, pack saturated integers to mm1. 将两个操作数相加，并将结果存储在mm1寄存器中。这里的mm1和mm2/m64表示寄存器或内存位置，分别存储着16位的数据。 PHADDSW xmm1, xmm2/m128 Add 16-bit signed integers horizontally, pack saturated integers to xmm1. 用于将两个操作数相加，并将结果存储在XMM寄存器中。这里的xmm1和xmm2/m128表示寄存器或内存位置，分别存储着16位的数据 PHADDSW执行的是带符号的16位整数加法操作。 PHADDSW (With 64-bit Operands) mm1[15-0] = SaturateToSignedWord((mm1[31-16] + mm1[15-0]); mm1[31-16] = SaturateToSignedWord(mm1[63-48] + mm1[47-32]); mm1[47-32] = SaturateToSignedWord(mm2/m64[31-16] + mm2/m64[15-0]); mm1[63-48] = SaturateToSignedWord(mm2/m64[63-48] + mm2/m64[47-32]); PHADDSW (With 128-bit Operands) xmm1[15-0]= SaturateToSignedWord(xmm1[31-16] + xmm1[15-0]); xmm1[31-16] = SaturateToSignedWord(xmm1[63-48] + xmm1[47-32]); xmm1[47-32] = SaturateToSignedWord(xmm1[95-80] + xmm1[79-64]); xmm1[63-48] = SaturateToSignedWord(xmm1[127-112] + xmm1[111-96]); xmm1[79-64] = SaturateToSignedWord(xmm2/m128[31-16] + xmm2/m128[15-0]); xmm1[95-80] = SaturateToSignedWord(xmm2/m128[63-48] + xmm2/m128[47-32]); xmm1[111-96] = SaturateToSignedWord(xmm2/m128[95-80] + xmm2/m128[79-64]); xmm1[127-112] = SaturateToSignedWord(xmm2/m128[127-112] + xmm2/m128[111-96]);
PHADDD	PHADDD xmm1, xmm2/m128 Add 32-bit integers horizontally, pack to xmm1. 将两个操作数相加，并将结果存储在XMM寄存器中。这里的XMM表示128位的寄存器， PHADDD mm1, mm2/m64 Add 32-bit integers horizontally, pack to mm1. 将两个操作数相加，并将结果存储在MM寄存器中。这里的MM表示64位的寄存器， PHADDD执行的是带符号的32位整数加法操作。 PHADDD (With 64-bit Operands) mm1[31-0] = mm1[63-32] + mm1[31-0]; mm1[63-32] = mm2/m64[63-32] + mm2/m64[31-0]; PHADDD (With 128-bit Operands) xmm1[31-0] = xmm1[63-32] + xmm1[31-0]; xmm1[63-32] = xmm1[127-96] + xmm1[95-64]; xmm1[95-64] = xmm2/m128[63-32] + xmm2/m128[31-0]; xmm1[127-96] = xmm2/m128[127-96] + xmm2/m128[95-64];
PHSUBW	PHSUBW mm1, mm2/m64 Subtract 16-bit signed integers horizontally, pack to mm1. 指令用于将两个操作数相减，并将结果存储在mm1寄存器中。这里的mm1和mm2/m64表示寄存器或内存位置，分别存储着16位的数据。 PHSUBW xmm1, xmm2/m128 Subtract 16-bit signed integers horizontally, pack to xmm1. 将两个操作数相减，并将结果存储在XMM寄存器中。这里的xmm1和xmm2/m128表示寄存器或内存位置，分别存储着16位的数据 PHSUBW执行的是带符号的16位整数减法操作 PHSUBW (With 64-bit Operands) mm1[15-0] = mm1[15-0] - mm1[31-16]; mm1[31-16] = mm1[47-32] - mm1[63-48]; mm1[47-32] = mm2/m64[15-0] - mm2/m64[31-16]; mm1[63-48] = mm2/m64[47-32] - mm2/m64[63-48]; PHSUBW (With 128-bit Operands) xmm1[15-0] = xmm1[15-0] - xmm1[31-16]; xmm1[31-16] = xmm1[47-32] - xmm1[63-48]; xmm1[47-32] = xmm1[79-64] - xmm1[95-80]; xmm1[63-48] = xmm1[111-96] - xmm1[127-112]; xmm1[79-64] = xmm2/m128[15-0] - xmm2/m128[31-16]; xmm1[95-80] = xmm2/m128[47-32] - xmm2/m128[63-48]; xmm1[111-96] = xmm2/m128[79-64] - xmm2/m128[95-80]; xmm1[127-112] = xmm2/m128[111-96] - xmm2/m128[127-112]
PHSUBSW	PHSUBSW mm1, mm2/m64 Subtract 16-bit signed integer horizontally, pack saturated integers to mm1. 用于将两个操作数相减，并将结果存储在mm1寄存器中。这里的mm1和mm2/m64表示寄存器或内存位置，分别存储着16位的数据。 PHSUBSW xmm1, xmm2/m128 Subtract 16-bit signed integer horizontally, pack saturated integers to xmm1. 用于将两个操作数相减，并将结果存储在XMM寄存器中。这里的xmm1和xmm2/m128表示寄存器或内存位置，分别存储着16位的数据 PHSUBSW执行的是带符号的16位整数减法操作。 PHSUBSW (With 64-bit Operands) mm1[15-0] = SaturateToSignedWord(mm1[15-0] - mm1[31-16]); mm1[31-16] = SaturateToSignedWord(mm1[47-32] - mm1[63-48]); mm1[47-32] = SaturateToSignedWord(mm2/m64[15-0] - mm2/m64[31-16]); mm1[63-48] = SaturateToSignedWord(mm2/m64[47-32] - mm2/m64[63-48]); PHSUBSW (With 128-bit Operands) xmm1[15-0] = SaturateToSignedWord(xmm1[15-0] - xmm1[31-16]); xmm1[31-16] = SaturateToSignedWord(xmm1[47-32] - xmm1[63-48]); xmm1[47-32] = SaturateToSignedWord(xmm1[79-64] - xmm1[95-80]); xmm1[63-48] = SaturateToSignedWord(xmm1[111-96] - xmm1[127-112]); xmm1[79-64] = SaturateToSignedWord(xmm2/m128[15-0] - xmm2/m128[31-16]); xmm1[95-80] =SaturateToSignedWord(xmm2/m128[47-32] - xmm2/m128[63-48]); xmm1[111-96] =SaturateToSignedWord(xmm2/m128[79-64] - xmm2/m128[95-80]); xmm1[127-112]= SaturateToSignedWord(xmm2/m128[111-96] - xmm2/m128[127-112]);
PHSUBD	PHSUBD mm1, mm2/m64 Subtract 32-bit signed integers horizontally, pack to mm1. 将两个操作数相减，并将结果存储在mm1寄存器中。这里的mm1和mm2/m64表示寄存器或内存位置，分别存储着32位的数据。 PHSUBD xmm1, xmm2/m128 Subtract 32-bit signed integers horizontally, pack to xmm1. 用于将两个操作数相减，并将结果存储在XMM寄存器中。这里的xmm1和xmm2/m128表示寄存器或内存位置，分别存储着32位的数据 PHSUBD执行的是带符号的32位整数减法操作。 PHSUBD (With 64-bit Operands) mm1[31-0] = mm1[31-0] - mm1[63-32]; mm1[63-32] = mm2/m64[31-0] - mm2/m64[63-32]; PHSUBD (With 128-bit Operands) xmm1[31-0] = xmm1[31-0] - xmm1[63-32]; xmm1[63-32] = xmm1[95-64] - xmm1[127-96]; xmm1[95-64] = xmm2/m128[31-0] - xmm2/m128[63-32]; xmm1[127-96] = xmm2/m128[95-64] - xmm2/m128[127-96];
Packed Absolute Values
PABSB	PABSB mm1, mm2/m64 Compute the absolute value of bytes in mm2/m64 and store UNSIGNED result in mm1. 将mm2/m64寄存器或内存位置中的8位带符号整数取绝对值，并将结果存储在mm1寄存器中 PABSB xmm1, xmm2/m128 Compute the absolute value of bytes in xmm2/m128 and store UNSIGNED result in xmm1. 将xmm2/m128寄存器或内存位置中的8位带符号整数取绝对值，并将结果存储在xmm1寄存器中 PABSB With 64-bit Operands: Unsigned DEST[7:0] := ABS(SRC[7: 0]) Repeat operation for 2nd through 7th bytes Unsigned DEST[63:58] := ABS(SRC[63:58]) PABSB With 128-bit Operands: Unsigned DEST[7:0] := ABS(SRC[7: 0]) Repeat operation for 2nd through 15th bytes Unsigned DEST[127:120] := ABS(SRC[127:120])
PABSW	PABSW mm1, mm2/m64 Compute the absolute value of 16-bit integers in mm2/m64 and store UNSIGNED result in mm1. 将mm2/m64寄存器或内存位置中的16位带符号整数取绝对值，并将结果存储在mm1寄存器中 PABSW xmm1, xmm2/m128 Compute the absolute value of 16-bit integers in xmm2/m128 and store UNSIGNED result in xmm1. 将xmm2/m128寄存器或内存位置中的16位带符号整数取绝对值，并将结果存储在xmm1寄存器中 PABSW With 64-bit Operands: Unsigned DEST[15:0] := ABS(SRC[15:0]) Repeat operation for 2nd through 3th 16-bit words Unsigned DEST[63:48] := ABS(SRC[63:48]) PABSW With 128-bit Operands: Unsigned DEST[15:0] := ABS(SRC[15:0]) Repeat operation for 2nd through 7th 16-bit words Unsigned DEST[127:112] := ABS(SRC[127:112])
PABSSD	PABSD mm1, mm2/m64 Compute the absolute value of 32-bit integers in mm2/m64 and store UNSIGNED result in mm1. 将mm2/m64寄存器或内存位置中的32位带符号整数取绝对值，并将结果存储在mm1寄存器中。 PABSD xmm1, xmm2/m128 Compute the absolute value of 32-bit integers in xmm2/m128 and store UNSIGNED result in xmm1 将xmm2/m128寄存器或内存位置中的32位带符号整数取绝对值，并将结果存储在xmm1寄存器中 PABSD With 64-bit Operands: Unsigned DEST[31:0] := ABS(SRC[31:0]) Unsigned DEST[63:32] := ABS(SRC[63:32]) PABSD With 128-bit Operands: Unsigned DEST[31:0] := ABS(SRC[31:0]) Repeat operation for 2nd through 3rd 32-bit double words Unsigned DEST[127:96] := ABS(SRC[127:96])
Multiply and Add Packed Signed and Unsigned Bytes
PMADDUBSW	PMADDUBSW mm1, mm2/m64 Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to mm1. 用于将两个操作数进行乘法和加法运算，并将结果存储在mm1寄存器中。这里的mm1和mm2/m64表示寄存器或内存位置，分别存储着8位的数据。 PMADDUBSW xmm1, xmm2/m128 Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to xmm1. 将两个操作数进行乘法和加法运算，并将结果存储在XMM寄存器中。这里的xmm1和xmm2/m128表示寄存器或内存位置，分别存储着8位的数据 PMADDUBSW执行的是带符号的8位整数乘法和加法操作。 PMADDUBSW (With 64-bit Operands) DEST[15-0] = SaturateToSignedWord(SRC[15-8]DEST[15-8]+SRC[7-0]DEST[7-0]); DEST[31-16] = SaturateToSignedWord(SRC[31-24]DEST[31-24]+SRC[23-16]DEST[23-16]); DEST[47-32] = SaturateToSignedWord(SRC[47-40]DEST[47-40]+SRC[39-32]DEST[39-32]); DEST[63-48] = SaturateToSignedWord(SRC[63-56]DEST[63-56]+SRC[55-48]DEST[55-48]); PMADDUBSW (With 128-bit Operands) DEST[15-0] = SaturateToSignedWord(SRC[15-8]* DEST[15-8]+SRC[7-0]DEST[7-0]); // Repeat operation for 2nd through 7th word SRC1/DEST[127-112] = SaturateToSignedWord(SRC[127-120]DEST[127-120]+ SRC[119-112]* DEST[119-112]);
Packed Multiply High with Round and Scale:
PMULHRSW	PMULHRSW mm1, mm2/m64 Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to mm1. 将两个操作数进行带符号16位整数乘法，并将结果存储在mm1寄存器中。 PMULHRSW xmm1, xmm2/m128 Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to xmm1. 将两个操作数进行带符号16位整数乘法，并将结果存储在XMM寄存器中。 PMULHRSW (With 64-bit Operands) temp0[31:0] = INT32 ((DEST[15:0] * SRC[15:0]) >>14) + 1; temp1[31:0] = INT32 ((DEST[31:16] * SRC[31:16]) >>14) + 1; temp2[31:0] = INT32 ((DEST[47:32] * SRC[47:32]) >> 14) + 1; temp3[31:0] = INT32 ((DEST[63:48] * SRc[63:48]) >> 14) + 1; DEST[15:0] = temp0[16:1]; DEST[31:16] = temp1[16:1]; DEST[47:32] = temp2[16:1]; DEST[63:48] = temp3[16:1]; PMULHRSW (With 128-bit Operands) temp0[31:0] = INT32 ((DEST[15:0] * SRC[15:0]) >>14) + 1; temp1[31:0] = INT32 ((DEST[31:16] * SRC[31:16]) >>14) + 1; temp2[31:0] = INT32 ((DEST[47:32] * SRC[47:32]) >>14) + 1; temp3[31:0] = INT32 ((DEST[63:48] * SRC[63:48]) >>14) + 1; temp4[31:0] = INT32 ((DEST[79:64] * SRC[79:64]) >>14) + 1; temp5[31:0] = INT32 ((DEST[95:80] * SRC[95:80]) >>14) + 1; temp6[31:0] = INT32 ((DEST[111:96] * SRC[111:96]) >>14) + 1; temp7[31:0] = INT32 ((DEST[127:112] * SRC[127:112) >>14) + 1; DEST[15:0] = temp0[16:1]; DEST[31:16] = temp1[16:1]; DEST[47:32] = temp2[16:1]; DEST[63:48] = temp3[16:1]; DEST[79:64] = temp4[16:1]; DEST[95:80] = temp5[16:1]; DEST[111:96] = temp6[16:1]; DEST[127:112] = temp7[16:1];
Packed Shuffle Bytes:
PSHUFB	PSHUFB mm1, mm2/m64 Shuffle bytes in mm1 according to contents of mm2/m64. 根据mm2/m64寄存器或内存位置中的内容对mm1寄存器中的数据进行按位重新排列 PSHUFB xmm1, xmm2/m128 Shuffle bytes in xmm1 according to contents of xmm2/m128. 根据xmm2/m128寄存器或内存位置中的内容对xmm1寄存器中的数据进行按位重新排列。 PSHUFB (With 64-bit Operands) TEMP := DEST for i = 0 to 7 { if (SRC[(i * 8)+7] = 1 ) then DEST[(i8)+7...(i8)+0] := 0; else index[2..0] := SRC[(i8)+2 .. (i8)+0]; DEST[(i8)+7...(i8)+0] := TEMP[(index8+7)..(index8+0)]; endif; } PSHUFB (with 128 bit operands) TEMP := DEST for i = 0 to 15 { if (SRC[(i * 8)+7] = 1 ) then DEST[(i8)+7..(i8)+0] := 0; else index[3..0] := SRC[(i8)+3 .. (i8)+0]; DEST[(i8)+7..(i8)+0] := TEMP[(index8+7)..(index8+0)]; endif }
Packed Sign:
`def byte_sign(control, input_val): if control<0: return negate(input_val) elif control==0: return 0 return input_val def word_sign(control, input_val): if control<0: return negate(input_val) elif control==0: return 0 return input_val def dword_sign(control, input_val): if control<0: return negate(input_val) elif control==0: return 0 return input_val`
PSIGNB	PSIGNB mm1, mm2/m64 Negate/zero/preserve packed byte integers in mm1 depending on the corresponding sign in mm2/m64. 将mm1寄存器中的每个字节(8位)根据mm2/m64寄存器或内存位置中的符号位进行取值。如果mm2/m64中对应字节的符号位为0，则结果字节取正值；如果mm2/m64中对应字节的符号位为1，则结果字节取负值。 PSIGNB xmm1, xmm2/m128 Negate/zero/preserve packed byte integers in xmm1 depending on the corresponding sign in xmm2/m128. 将xmm1寄存器中的每个字节(8位)根据xmm2/m128寄存器或内存位置中的符号位进行取值。如果xmm2/m128中对应字节的符号位为0，则结果字节取正值；如果xmm2/m128中对应字节的符号位为1，则结果字节取负值。 PSIGNB srcdest, src // MMX 64-bit Operands VL=64 KL := VL/8 for i in 0...KL-1: srcdest.byte[i] := byte_sign(src.byte[i], srcdest.byte[i]) PSIGNB srcdest, src // SSE 128-bit Operands VL=128 KL := VL/8 FOR i in 0...KL-1: srcdest.byte[i] := byte_sign(src.byte[i], srcdest.byte[i])
PSIGNW	PSIGNW mm1, mm2/m64 Negate/zero/preserve packed word integers in mm1 depending on the corresponding sign in mm2/m128. 将mm1寄存器中的每个字（16位）根据mm2/m64寄存器或内存位置中的符号位进行取值。如果mm2/m64中对应字的符号位为0，则结果字取正值；如果mm2/m64中对应字的符号位为1，则结果字取负值。 PSIGNW xmm1, xmm2/m128 Negate/zero/preserve packed word integers in xmm1 depending on the corresponding sign in xmm2/m128. 将xmm1寄存器中的每个字（16位）根据xmm2/m128寄存器或内存位置中的符号位进行取值。如果xmm2/m128中对应字的符号位为0，则结果字取正值；如果xmm2/m128中对应字的符号位为1，则结果字取负值。 PSIGNW srcdest, src // MMX 64-bit Operands VL=64 KL := VL/16 FOR i in 0...KL-1: srcdest.word[i] := word_sign(src.word[i], srcdest.word[i]) PSIGNW srcdest, src // SSE 128-bit Operands VL=128 KL := VL/16 FOR i in 0...KL-1: srcdest.word[i] := word_sign(src.word[i], srcdest.word[i])
PSIGND	PSIGND mm1, mm2/m64 Negate/zero/preserve packed doubleword integers in mm1 depending on the corresponding sign in mm2/m128. 将mm1寄存器中的每个双字（32位）根据mm2/m64寄存器或内存位置中的符号位进行取值。如果mm2/m64中对应双字的符号位为0，则结果双字取正值；如果mm2/m64中对应双字的符号位为1，则结果双字取负值。 PSIGND xmm1, xmm2/m128 Negate/zero/preserve packed doubleword integers in xmm1 depending on the corresponding sign in xmm2/m128. 于将xmm1寄存器中的每个双字（32位）根据xmm2/m128寄存器或内存位置中的符号位进行取值。如果xmm2/m128中对应双字的符号位为0，则结果双字取正值；如果xmm2/m128中对应双字的符号位为1，则结果双字取负值。 PSIGND srcdest, src // MMX 64-bit Operands VL=64 KL := VL/32 FOR i in 0...KL-1: srcdest.dword[i] := dword_sign(src.dword[i], srcdest.dword[i]) PSIGND srcdest, src // SSE 128-bit Operands VL=128 KL := VL/32 FOR i in 0...KL-1: srcdest.dword[i] := dword_sign(src.dword[i], srcdest.dword[i])
Packed Align Right:
PALIGNR	PALIGNR mm1, mm2/m64, imm8 Concatenate destination and source operands, extract byte-aligned result shifted to the right by constant value in imm8 into mm1. 将mm1寄存器和mm2/m64寄存器或内存位置中的数据进行对齐和移位操作， imm8参数指定了移位的位数。 PALIGNR xmm1, xmm2/m128, imm8 Concatenate destination and source operands, extract byte-aligned result shifted to the right by constant value in imm8 into xmm1. 将xmm1寄存器和xmm2/m128寄存器或内存位置中的数据进行对齐和移位操作， imm8参数指定了移位的位数 PALIGNR (With 64-bit Operands) temp1[127:0] = CONCATENATE(DEST,SRC)>>(imm88) DEST[63:0] = temp1[63:0] PALIGNR (With 128-bit Operands) temp1[255:0] := ((DEST[127:0] << 128) OR SRC[127:0])>>(imm88); DEST[127:0] := temp1[127:0] DEST[MAXVL-1:128] (Unmodified)