Intel SSE / MMX2 / KNI documentation

转自 http://intel80386.com/simd/mmx2-doc.html



Please note, this is a work-in-progress (ie BETA).

Timings are of approximate throughput cycles using average from TSC, the
latency and ranges are indicated where known.


 
 
ADDPS Add Parallel Scalars Opcode Cycles Instruction 0F 58 2 (3) ADDPS xmm reg,xmm reg/mem128 ADDPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] + op2[0] op1[1] = op1[1] + op2[1] op1[2] = op1[2] + op2[2] op1[3] = op1[3] + op2[3]
ADDSS Add Single Scalar Opcode Cycles Instruction F3 0F 58 1 (3) ADDSS xmm reg,xmm reg/mem32 ADDPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] + op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
ANDNPS And Not Parallel Scalars (bitwise) Opcode Cycles Instruction 0F 55 2 ANDNPS xmm reg,xmm reg/mem128 ANDNPS op1, op2 op1 contains 1 128-bit value op2 contains 1 128-bit value op1 = !op1 & op2
ANDPS And Parallel Scalars (bitwise) Opcode Cycles Instruction 0F 54 2 ANDPS xmm reg,xmm reg/mem128 ANDPS op1, op2 op1 contains 1 128-bit value op2 contains 1 128-bit value op1 = op1 & op2
CMPEQPS Compare Equal Parallel Scalars Opcode Cycles Instruction 0F C2 .. 00 2 (3) CMPEQPS xmm reg,xmm reg/mem128 CMPEQPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] == op2[0] op1[1] = op1[1] == op2[1] op1[2] = op1[2] == op2[2] op1[3] = op1[3] == op2[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPEQSS Compare Equal Single Scalar Opcode Cycles Instruction F3 0F C2 .. 00 1 (3) CMPEQSS xmm reg,xmm reg/mem32 CMPEQSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] == op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPLEPS Compare Less than or Equal Parallel Scalars Opcode Cycles Instruction 0F C2 .. 02 2 (3) CMPLEPS xmm reg,xmm reg/mem128 CMPLEPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] <= op2[0] op1[1] = op1[1] <= op2[1] op1[2] = op1[2] <= op2[2] op1[3] = op1[3] <= op2[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPLESS Compare Less than or Equal Single Scalar Opcode Cycles Instruction F3 0F C2 .. 02 1 (3) CMPLESS xmm reg,xmm reg/mem32 CMPLESS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] <= op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPLTPS Compare Less Than Parallel Scalars Opcode Cycles Instruction 0F C2 .. 01 2 (3) CMPLTPS xmm reg,xmm reg/mem128 CMPLTPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] < op2[0] op1[1] = op1[1] < op2[1] op1[2] = op1[2] < op2[2] op1[3] = op1[3] < op2[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPLTSS Compare Less Than Single Scalar Opcode Cycles Instruction F3 0F C2 .. 01 1 (3) CMPLTSS xmm reg,xmm reg/mem32 CMPLTSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] < op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPNEQPS Compare Not Equal Parallel Scalars Opcode Cycles Instruction 0F C2 .. 04 2 (3) CMPNEQPS xmm reg,xmm reg/mem128 CMPNEQPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] != op2[0] op1[1] = op1[1] != op2[1] op1[2] = op1[2] != op2[2] op1[3] = op1[3] != op2[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPNEQSS Compare Not Equal Single Scalar Opcode Cycles Instruction F3 0F C2 .. 04 1 (3) CMPNEQSS xmm reg,xmm reg/mem32 CMPNEQSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] != op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPNLEPS Compare Not Less than or Equal Parallel Scalars Opcode Cycles Instruction 0F C2 .. 06 2 (3) CMPNLEPS xmm reg,xmm reg/mem128 CMPNLEPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] > op2[0] op1[1] = op1[1] > op2[1] op1[2] = op1[2] > op2[2] op1[3] = op1[3] > op2[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPNLESS Compare Not Less than or Equal Single Scalar Opcode Cycles Instruction F3 0F C2 .. 06 1 (3) CMPNLESS xmm reg,xmm reg/mem32 CMPNLESS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] > op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPNLTPS Compare Not Less Than Parallel Scalars Opcode Cycles Instruction 0F C2 .. 05 2 (3) CMPNLTPS xmm reg,xmm reg/mem128 CMPNLTPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] >= op2[0] op1[1] = op1[1] >= op2[1] op1[2] = op1[2] >= op2[2] op1[3] = op1[3] >= op2[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPNLTSS Compare Not Less Than Single Scalar Opcode Cycles Instruction F3 0F C2 .. 01 1 (3) CMPNLTSS xmm reg,xmm reg/mem32 CMPNLTSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] >= op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPORDPS Compare Ordered Parallel Scalars Opcode Cycles Instruction 0F C2 .. 07 2 (3) CMPORDPS xmm reg,xmm reg/mem128 CMPORDPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = (op1[0] != NaN) && (op2[0] != NaN) op1[1] = (op1[1] != NaN) && (op2[1] != NaN) op1[2] = (op1[2] != NaN) && (op2[2] != NaN) op1[3] = (op1[3] != NaN) && (op2[3] != NaN) TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPORDSS Compare Ordered Single Scalar Opcode Cycles Instruction F3 0F C2 .. 07 1 (3) CMPORDSS xmm reg,xmm reg/mem32 CMPORDSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = (op1[0] != NaN) && (op2 != NaN) op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPUNORDPS Compare Unordered Parallel Scalars Opcode Cycles Instruction 0F C2 .. 03 2 (3) CMPUNORDPS xmm reg,xmm reg/mem128 CMPUNORDPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = (op1[0] == NaN) || (op2[0] == NaN) op1[1] = (op1[1] == NaN) || (op2[1] == NaN) op1[2] = (op1[2] == NaN) || (op2[2] == NaN) op1[3] = (op1[3] == NaN) || (op2[3] == NaN) TRUE = 0xFFFFFFFF FALSE = 0x00000000
CMPUNORDSS Compare Unordered Single Scalar Opcode Cycles Instruction F3 0F C2 .. 03 1 (3) CMPUNORDSS xmm reg,xmm reg/mem32 CMPUNORDSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = (op1[0] == NaN) || (op2 == NaN) op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] TRUE = 0xFFFFFFFF FALSE = 0x00000000
COMISS Compare Integer Single Scalar Opcode Cycles Instruction 0F 2F COMISS xmm reg,xmm reg/mem32 COMISS op1, op2
CVTPI2PS Convert Parallel Integer to Parallel Scalars Opcode Cycles Instruction 0F 2A CVTPI2PS xmm reg,mm reg/mem64 CVTPI2PS op1, op2 op1 contains 2 single precision 32-bit floating point values op2 contains 2 32-bit integer values op1[0] = (float)op2[0] op1[1] = (float)op2[1] op1[2] = op1[2] op1[3] = op1[3]
CVTPS2PI Convert Parallel Scalars to Parallel Integers Opcode Cycles Instruction 0F 2D CVTPS2PI mm reg,xmm reg/mem128 CVTPS2PI op1, op2 op1 contains 2 32-bit integer values op2 contains 2 single precision 32-bit floating point values op1[0] = (long)op2[0] op1[1] = (long)op2[1]
CVTSI2SS Convert Parallel Integers to Parallel Scalars Opcode Cycles Instruction F3 0F 2A CVTSI2SS xmm reg,reg32/mem32 CVTSI2SS op1, op2 op1 contains 1 single precision 32-bit floating point value op2 contains 1 32-bit integer value op1[0] = (float)op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
CVTSS2SI Convert Single Scalar to Single Integer Opcode Cycles Instruction F3 0F 2D CVTSS2SI reg32,xmm reg/mem128 CVTPS2PI op1, op2 op1 contains 1 32-bit integer value op2 contains 1 single precision 32-bit floating point value op1 = (long)op2[0]
CVTTPS2PI Convert Parallel Scalars to Parallel Integers Opcode Cycles Instruction 0F 2C CVTTPS2PI mm reg,xmm reg/mem128 CVTTPS2PI op1, op2 op1 contains 2 32-bit integer values op2 contains 2 single precision 32-bit floating point values op1[0] = (long)op2[0] op1[1] = (long)op2[1]
CVTTSS2SI Convert Single Scalar to Single Integer Opcode Cycles Instruction F3 0F 2C CVTTSS2SI reg32,xmm reg/mem128 CVTTSS2SI op1, op2 op1 contains 1 32-bit integer value op2 contains 1 single precision 32-bit floating point value op1 = (long)op2[0]
DIVPS Divide Parallel Scalars Opcode Cycles Instruction 0F 5E 15-115 DIVPS xmm reg,xmm reg/mem128 DIVPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] / op2[0] op1[1] = op1[1] / op2[1] op1[2] = op1[2] / op2[2] op1[3] = op1[3] / op2[3]
DIVSS Divide Single Scalar Opcode Cycles Instruction F3 0F 5E 7-98 DIVSS xmm reg,xmm reg/mem32 DIVSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] / op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
FXRSTOR Floating Point Extended Restore Opcode Cycles Instruction 0F AE xx001xxx FXRSTOR mem FXRSTOR op1 op1 contains a 512 byte register context, paragraph aligned
FXSAVE Floating Point Extended Save Opcode Cycles Instruction 0F AE xx000xxx FXSAVE mem FXSAVE op1 op1 contains a 512 byte register context, paragraph aligned
LDMXCSR Load Multimedia Extended Control Status Register Opcode Cycles Instruction 0F AE xx010xxx LDMXCSR mem32 LDMXCSR op1 op1 contains 1 32-bit register MXCSR = op1
MASKMOVQ Opcode Cycles Instruction 0F F7 MASKMOVQ mm reg,mm reg MASKMOVQ op1, op2
MAXPS Maximum Parallel Scalars Opcode Cycles Instruction 0F 5F 2 (3) MAXPS xmm reg,xmm reg/mem128 MAXPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = max(op1[0], op2[0]) op1[1] = max(op1[1], op2[1]) op1[2] = max(op1[2], op2[2]) op1[3] = max(op1[3], op2[3])
MAXSS Maximum Single Scalar Opcode Cycles Instruction F3 0F 5F 1 (3) MAXSS xmm reg,xmm reg/mem32 MAXSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = max(op1[0], op2) op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
MINPS Minimum Parallel Scalars Opcode Cycles Instruction 0F 5D 2 (3) MINPS xmm reg,xmm reg/mem128 MINPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = min(op1[0], op2[0]) op1[1] = min(op1[1], op2[1]) op1[2] = min(op1[2], op2[2]) op1[3] = min(op1[3], op2[3])
MINSS Minimum Single Scalar Opcode Cycles Instruction F3 0F 5D 1 (3) MINSS xmm reg,xmm reg/mem32 MINSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = min(op1[0], op2) op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
MOVAPS Aligned Move Parallel Scalars Opcode Cycles Instruction 0F 28 MOVAPS xmm reg,xmm reg/mem128 0F 29 MOVAPS mem128,xmm reg MOVAPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op2[0] op1[1] = op2[1] op1[2] = op2[2] op1[3] = op2[3] * Addresses must be paragraph aligned
MOVHPS Move High Pair Parallel Scalars Opcode Cycles Instruction 0F 16 MOVHPS xmm reg,mem64 0F 17 MOVHPS mem64,xmm reg MOVHPS op1, op2 op1 contains 2 single precision 32-bit floating point values op2 contains 2 single precision 32-bit floating point values op1[2] = op2[0] (xmm reg,mem64) op1[3] = op2[1] op1[0] = op2[2] (mem64,xmm reg) op1[1] = op2[3]
MOVHLPS Move High to Low Pair Parallel Scalars Opcode Cycles Instruction 0F 12 1 MOVHLPS xmm reg,xmm reg MOVHLPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op2[2] op1[1] = op2[3] op1[2] = op1[2] op1[3] = op1[3]
MOVLPS Move Low Pair Parallel Scalars Opcode Cycles Instruction 0F 12 MOVLPS xmm reg,mem64 0F 13 MOVLPS mem64,xmm reg MOVLPS op1, op2 op1 contains 2 single precision 32-bit floating point values op2 contains 2 single precision 32-bit floating point values op1[0] = op2[0] op1[1] = op2[1]
MOVLHPS Move Low to High Pair Parallel Scalars Opcode Cycles Instruction 0F 16 1 MOVLHPS xmm reg,xmm reg MOVLHPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] op1[1] = op1[1] op1[2] = op2[0] op1[3] = op2[1]
MOVMSKPS Opcode Cycles Instruction 0F 50 MOVMSKPS reg32,xmm reg MOVMSKPS op1, op2
MOVNTPS Uncached Move Parallel Scalars Opcode Cycles Instruction 0F 2B MOVNTPS mem128,xmm reg MOVNTPS op1, op2 op1 contains 1 128-bit value op2 contains 1 128-bit value op1 = op2
MOVNTQ Uncached Move Quad Word Opcode Cycles Instruction 0F E7 MOVNTQ mem64,mm reg MOVNTQ op1, op2 op1 contains 1 64-bit value op2 contains 1 64-bit value op1 = op2
MOVSS Move Single Scalar Opcode Cycles Instruction F3 0F 10 MOVSS xmm reg,xmm reg/mem32 F3 0F 11 MOVSS mem32,xmm reg MOVSS op1, op2 op1 contains 1 single precision 32-bit floating point value op2 contains 1 single precision 32-bit floating point value op1[0] = op2[0]
MOVUPS Unaligned Move Parallel Scalars Opcode Cycles Instruction 0F 10 MOVUPS xmm reg,xmm reg/mem128 0F 11 MOVUPS mem128,xmm reg MOVUPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op2[0] op1[1] = op2[1] op1[2] = op2[2] op1[3] = op2[3]
MULPS Multiply Parallel Scalars Opcode Cycles Instruction 0F 59 2 (4) MULPS xmm reg,xmm reg/mem128 MULPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] * op2[0] op1[1] = op1[1] * op2[1] op1[2] = op1[2] * op2[2] op1[3] = op1[3] * op2[3]
MULSS Multiply Single Scalar Opcode Cycles Instruction F3 0F 59 1 (4) MULSS xmm reg,xmm reg/mem32 MULSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] * op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
ORPS Or Parallel Scalars (bitwise) Opcode Cycles Instruction 0F 56 2 ORPS xmm reg,xmm reg/mem128 ORPS op1, op2 op1 contains 1 128-bit value op2 contains 1 128-bit value op1 = op1 | op2
PAVGB Parallel Integer Average Byte Opcode Cycles Instruction 0F E0 PAVGB mm reg,mm reg/mem64 PAVGB op1, op2 op1 contains 8 8-bit integer values op2 contains 8 8-bit integer values op1[0] = (op1[0] + op2[0]) / 2 op1[1] = (op1[1] + op2[1]) / 2 op1[2] = (op1[2] + op2[2]) / 2 op1[3] = (op1[3] + op2[3]) / 2 op1[4] = (op1[4] + op2[4]) / 2 op1[5] = (op1[5] + op2[5]) / 2 op1[6] = (op1[6] + op2[6]) / 2 op1[7] = (op1[7] + op2[7]) / 2
PAVGW Parallel Integer Average Word Opcode Cycles Instruction 0F E3 PAVGW mm reg,mm reg/mem64 PAVGW op1, op2 op1 contains 4 16-bit integer values op2 contains 4 16-bit integer values op1[0] = (op1[0] + op2[0]) / 2 op1[1] = (op1[1] + op2[1]) / 2 op1[2] = (op1[2] + op2[2]) / 2 op1[3] = (op1[3] + op2[3]) / 2
PEXTRW Opcode Cycles Instruction 0F C5 PEXTRW reg32,mm reg,imm8 PEXTRW op1, op2, op3
PINSRW Opcode Cycles Instruction 0F C4 PINSRW mm reg,reg32/mem32,imm8 PINSRW op1, op2, op3
PMAXSW Parallel Integer Maximum Signed Word Opcode Cycles Instruction 0F EE PMAXSW mm reg,mm reg/mem64 PMAXSW op1, op2 op1 contains 4 16-bit signed integer values op2 contains 4 16-bit signed integer values op1[0] = max(op1[0], op2[0]) op1[1] = max(op1[1], op2[1]) op1[2] = max(op1[2], op2[2]) op1[3] = max(op1[3], op2[3])
PMAXUB Parallel Integer Maximum Unsigned Byte Opcode Cycles Instruction 0F DE PMAXUB mm reg,mm reg/mem64 PMAXUB op1, op2 op1 contains 8 8-bit unsigned integer values op2 contains 8 8-bit unsigned integer values op1[0] = max(op1[0], op2[0]) op1[1] = max(op1[1], op2[1]) op1[2] = max(op1[2], op2[2]) op1[3] = max(op1[3], op2[3]) op1[4] = max(op1[4], op2[4]) op1[5] = max(op1[5], op2[5]) op1[6] = max(op1[6], op2[6]) op1[7] = max(op1[7], op2[7])
PMINSW Parallel Integer Minimum Signed Word Opcode Cycles Instruction 0F EA PMINSW mm reg,mm reg/mem64 PMINSW op1, op2 op1 contains 4 16-bit signed integer values op2 contains 4 16-bit signed integer values op1[0] = min(op1[0], op2[0]) op1[1] = min(op1[1], op2[1]) op1[2] = min(op1[2], op2[2]) op1[3] = min(op1[3], op2[3])
PMINUB Parallel Integer Minimum Unsigned Byte Opcode Cycles Instruction 0F DA PMINUB mm reg,mm reg/mem64 PMINUB op1, op2 op1 contains 8 8-bit unsigned integer values op2 contains 8 8-bit unsigned integer values op1[0] = min(op1[0], op2[0]) op1[1] = min(op1[1], op2[1]) op1[2] = min(op1[2], op2[2]) op1[3] = min(op1[3], op2[3]) op1[4] = min(op1[4], op2[4]) op1[5] = min(op1[5], op2[5]) op1[6] = min(op1[6], op2[6]) op1[7] = min(op1[7], op2[7])
PMOVMSKB Opcode Cycles Instruction 0F D7 PMOVMSKB reg32,mm reg PMOVMSKB op1, op2
PMULHUW Multiply unsigned word store high Opcode Cycles Instruction 0F E4 PMULHUW mm reg,mm reg/mem64 PMULHUW op1, op2 op1 contains 4 16-bit unsigned integer values op2 contains 4 16-bit unsigned integer values op1[0] = (op1[0] * op2[0]) >> 16 op1[1] = (op1[1] * op2[1]) >> 16 op1[2] = (op1[2] * op2[2]) >> 16 op1[3] = (op1[3] * op2[3]) >> 16
PREFETCHNTA Prefetch Non-caching Aligned ? Opcode Cycles Instruction 0F 18 xx000xxx PREFETCHNTA mem8 PREFETCHNTA op1
PREFETCHT0 Prefetch Task 0 ? Opcode Cycles Instruction 0F 18 xx001xxx PREFETCHT0 mem8 PREFETCHT0 op1
PREFETCHT1 Prefetch Task 1 ? Opcode Cycles Instruction 0F 18 xx010xxx PREFETCHT1 mem8 PREFETCHT1 op1
PREFETCHT2 Prefetch Task 2 ? Opcode Cycles Instruction 0F 18 xx011xxx PREFETCHT2 mem8 PREFETCHT2 op1
PSADBW Opcode Cycles Instruction 0F F6 PSADBW mm reg,mm reg/mem64 PSADBW op1, op2
PSHUFW Shuffle Parallel Words Opcode Cycles Instruction 0F 70 1 (1) PSHUFW mm reg,mm reg/mem64,imm8 PSHUFW op1, op3, op3 op1 contains 4 16-bit integer values op2 contains 4 16-bit integer values op3 contains a bit map dd:cc:bb:aa (MSB to LSB) op1[0] = op2[aa] op1[1] = op2[bb] op1[2] = op2[cc] op1[3] = op2[dd]
RCPPS Reciprocal Parallel Scalars Opcode Cycles Instruction 0F 53 2 (2) RCPPS xmm reg,xmm reg/mem128 RCPPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = 1 / op2[0] op1[1] = 1 / op2[1] op1[2] = 1 / op2[2] op1[3] = 1 / op2[3] * The results have 12-bit accuracy
RCPSS Reciprocal Single Scalar Opcode Cycles Instruction F3 0F 53 1 (1) RCPSS xmm reg,xmm reg/mem32 RCPSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = 1 / op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] * The results have 12-bit accuracy
RSQRTPS Reciprocal Square Root Parallel Scalars Opcode Cycles Instruction 0F 52 2 (2) RSQRTPS xmm reg,xmm reg/mem128 RSQRTPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = 1 / sqrt(op2[0]) op1[1] = 1 / sqrt(op2[1]) op1[2] = 1 / sqrt(op2[2]) op1[3] = 1 / sqrt(op2[3]) * The results have 12-bit accuracy
RSQRTSS Reciprocal Square Root Single Scalar Opcode Cycles Instruction F3 0F 52 1 (1) RSQRTSS xmm reg,xmm reg/mem32 RSQRTSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = 1 / sqrt(op2) op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3] * The results have 12-bit accuracy
SFENCE Stream Fence Opcode Cycles Instruction 0F AE FF SFENCE SFENCE Provides a demarkation in write combining buffers to force current states to be committed. In other words writes to the same location cannot combine to one if there is a fence placed between them. * Presumed function from AGP definition of fencing
SHUFPS Shuffle Parallel Scalars Opcode Cycles Instruction 0F C6 3 SHUFPS xmm reg, xmm reg/mem128, imm8 SHUFPS op1, op3, op3 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op3 contains a bit map dd:cc:bb:aa (MSB to LSB) op1[0] = op1[aa] op1[1] = op1[bb] op1[2] = op2[cc] op1[3] = op2[dd]
SQRTPS Square Root Parallel Scalars Opcode Cycles Instruction 0F 51 16-134 SQRTPS xmm reg,xmm reg/mem128 SQRTPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = sqrt(op2[0]) op1[1] = sqrt(op2[1]) op1[2] = sqrt(op2[2]) op1[3] = sqrt(op2[3])
SQRTSS Square Root Single Scalar Opcode Cycles Instruction F3 0F 51 8-105 SQRTSS xmm reg,xmm reg/mem32 SQRTSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = sqrt(op2) op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
STMXCSR Store Multimedia Extended Control Status Register Opcode Cycles Instruction 0F AE xx011xxx STMXCSR mem32 STMXCSR op1 op1 contains 1 32-bit register op1 = MXCSR
SUBPS Subtract Parallel Scalars Opcode Cycles Instruction 0F 5C 2 (3) SUBPS xmm reg,xmm reg/mem128 SUBPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] - op2[0] op1[1] = op1[1] - op2[1] op1[2] = op1[2] - op2[2] op1[3] = op1[3] - op2[3]
SUBSS Subtract Single Scalar Opcode Cycles Instruction F3 0F 5C 1 (3) SUBSS xmm reg,xmm reg/mem32 SUBSS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 1 single precision 32-bit floating point value op1[0] = op1[0] - op2 op1[1] = op1[1] op1[2] = op1[2] op1[3] = op1[3]
UCOMISS Opcode Cycles Instruction 0F 2E UCOMISS xmm reg,xmm reg/mem32 UCOMISS op1, op2
UNPCKHPS Unpack High Parallel Scalars Opcode Cycles Instruction 0F 15 2 UNPCKHPS xmm reg,xmm reg/mem128 UNPCKHPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[2] op1[1] = op2[2] op1[2] = op1[3] op1[3] = op2[3]
UNPCKLPS Unpack Low Parallel Scalars Opcode Cycles Instruction 0F 14 2 UNPCKLPS xmm reg,xmm reg/mem128 UNPCKLPS op1, op2 op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values op1[0] = op1[0] op1[1] = op2[0] op1[2] = op1[1] op1[3] = op2[1]
XORPS Exclusive-Or Parallel Scalars (bitwise) Opcode Cycles Instruction 0F 57 2 XORPS xmm reg,xmm reg/mem128 XORPS op1, op2 op1 contains 1 128-bit value op2 contains 1 128-bit value op1 = op1 ^ op2


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。
经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值