SSE2 (单指令多数据流扩展)浮点指令使用128位的XMM寄存器,可以处理双精度(64位)浮点值。也有一些工作于单精度(32位)浮点值的指令。SSE2在Pentium 4 和 Xeon处理器中被提出。
这些指令跟SSE浮点指令非常类似,除了它们工作的数据长度不同。
在你的代码中使用这些指令之前,你必须检测你的机器是否支持它们。设置EAX=1,调用CPUID指令,此时测试EDX的第26位,如果为1则表示支持SSE2指令。
本文的测试程序都将使用以下的数据声明:
DOUBLEFP1 DQ 1.1
DQ 3.3
DOUBLEFP2 DQ 20.66
DQ 40.66
DOUBLEFPN DQ -5.1
DQ +6.3
由于这些数据不能保证16位对齐,所以从内存到XMM寄存器传输数据必须使用MOVUPD指令。MOVUPD(移动两个未对齐的双精度值)不关心对齐。如果你在数据声明的时候指定了16字节对齐,那么就可以使用更快的MOVAPD(移动两个对齐的双精度值)指令。当在两个寄存器之间传输的时候,MOVUPD或者MOVAPD都可以使用。
我们在这里看到的指令往往可以分为两种类型,第一种指令一次处理两个64位浮点数,这些指令的名字里包含“PD”,指的是“packed double-precision”。第二种指令一次处理一个64位浮点数,这些指令的名字里包含“SD”,指的是“scalar double-precision”。 它们仅仅工作在XMM寄存器的低位部分,也就是说寄存器的64位(0-63)。
下面的测试程序,你可以给它们设置适当的断点,单步运行。你能看到在程序运行中XMM寄存器的改变。
SSE2指令
SSE2数据转移指令
这个测试程序演示在寄存器之间移动数据。MOVUPD和MOVAPD(对齐版本),MOVSD,MOVLPD和MOVHPD也能被使用在内存输入输出中获得数据。MOVMSKPD在比较指令后使用,可以把比较结果存入eax以便分析。
作为一个测试程序,我们也能尝试使用SSE整数指令MOVDQU和SSE浮点指令MOVUPS做这些事,后者看起来很像MOVUPD。它似乎只是位拷贝数据到XMM寄存器。然而,Intel警告反对这种不明确方式的使用指令,以防止未知的性能问题。
XMMSSE2_FPDATA:-
XMMSSE2_FPDATA:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L20 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L20:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM7,XMM0 ;copying to XMM7
MOVSD XMM2,[DOUBLEFP2] ;move fp value to XMM1 low only
MOVLPD XMM3,[DOUBLEFP2] ;this seems to be the same
MOVHPD XMM4,[DOUBLEFP2] ;but this moves the high value
MOVUPD XMM0,[DOUBLEFPN] ;move two new values, one is negative
MOVMSKPD EAX,XMM0 ;get both sign bits in XMM0 into eax
;************ and as an experiment, see if this does the same as MOVUPD ..
MOVDQU XMM1,[DOUBLEFPN] ;use integer instruction to transfer the bits
;************ as this too (one byte smaller) ..
MOVUPS XMM2,[DOUBLEFPN] ;use SSE instruction to transfer the bits
RET
SSE2数学运算指令
XMMSSE2_FPARITH:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L22 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L22:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
ADDPD XMM0,XMM1 ;add both fp values result in XMM0
MOVAPD XMM0,XMM2 ;restore value in XMM0
SUBPD XMM0,XMM1 ;subtract both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
ADDSD XMM0,XMM1 ;add low fp value result in XMM0
SUBSD XMM0,XMM1 ;subtract low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MULPD XMM0,XMM1 ;multiply both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MULSD XMM0,XMM1 ;multiply low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
DIVPD XMM0,XMM1 ;divide both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
DIVSD XMM0,XMM1 ;divide low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
SQRTPD XMM0,XMM1 ;get square roots of both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
SQRTSD XMM0,XMM1 ;get square root of low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MAXPD XMM0,XMM1 ;get numerically greater fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MAXSD XMM0,XMM1 ;get numerically greater of low fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MINPD XMM0,XMM1 ;get numerically smaller fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MINSD XMM0,XMM1 ;get numerically smaller of low fp values result in XMM0
RET
SSE2逻辑运算指令
XMMSSE2_FPLOGIC:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L24 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L24:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
ANDPD XMM0,XMM1 ;perform AND on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
ANDNPD XMM0,XMM1 ;perform AND NOT on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
ORPD XMM0,XMM1 ;perform OR on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
XORPD XMM0,XMM1 ;perform XOR on both fp values result in XMM0
RET
SSE2比较指令
XMMSSE2_FPCOMP:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L26 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L26:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
;********************* compare instructions working on both fp values
CMPPD XMM0,XMM1,0 ;=CMPEQPD see whether equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,1 ;=CMPLTPD see whether less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,2 ;=CMPLEPD see whether less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,3 ;=CMPUNORDPD see unordered, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,4 ;=CMPNEQPD see whether not equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,5 ;=CMPNLTPD see whether not less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,6 ;=CMPNLEPD see whether not less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,7 ;=CMPORDPD see whether ordered, result in XMM0
;********************* compare instructions working on low value only
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,0 ;=CMPEQPD see whether equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,1 ;=CMPLTPD see whether less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,2 ;=CMPLEPD see whether less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,3 ;=CMPUNORDPD see unordered, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,4 ;=CMPNEQPD see whether not equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,5 ;=CMPNLTPD see whether not less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,6 ;=CMPNLEPD see whether not less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,7 ;=CMPORDPD see whether ordered, result in XMM0
;********************* compare and give result in eflags
MOVAPD XMM0,XMM2 ;restore original value to XMM0
COMISD XMM0,XMM1 ;look at lowest only result in eflags
UCOMISD XMM0,XMM1 ;(unordered compare)
MOVUPD XMM1,[DOUBLEFPN] ;move two -ve, two +ve values into XMM1
COMISD XMM0,XMM1 ;look at lowest only - result in eflags
UCOMISD XMM0,XMM1 ;(unordered compare)
RET
SSE2乱序与扩展指令
XMMSSE2_SHUFF:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L28 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L28:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
SHUFPD XMM0,XMM1,3h ;shuffle pack into destination
SHUFPD XMM0,XMM0,1h ;swap the values in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
UNPCKHPD XMM0,XMM1 ;unpack (high) and put into destination
MOVAPD XMM0,XMM2 ;restore original value to XMM0
UNPCKLPD XMM0,XMM0 ;unpack (low) and put into destination
RET
SSE2转换指令
XMMSSE2_CONV:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L30 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L30:
;***** display XMM registers in both SSE and SSE2 modes ..
;***** conversion between single and double-precision fp values ..
CVTPS2PD XMM0,[SINGLEFP1] ;put single-precision fp values into XMM0 as double-precision
CVTPD2PS XMM6,XMM0 ;convert double precision to single precision in XMM7
CVTSS2SD XMM1,[SINGLEFP1] ;as CVTPS2PD but working with only one value
CVTSD2SS XMM7,XMM1 ;as CVTSS2SD but working with only one value
;***** conversion between integers and double-precision fp values ..
;***** open the MMX integer pane for these tests ..
CVTPD2PI MM0,XMM0 ;convert fp values in XMM0 to integers in MM0
CVTTPD2PI MM1,XMM0 ;same as above with truncation
CVTPI2PD XMM0,[DINTEGER] ;convert 23 and 24 to double-precision fp values
;***** open the XMM integer display and switch to dword display
CVTPD2DQ XMM7,XMM0 ;and convert 23 and 24 to dword integers into XMM7 (low)
CVTTPD2DQ XMM7,XMM0 ;same as above with truncation
CVTDQ2PD XMM3,XMM7 ;and back into fp values in XMM3
CVTSD2SI EAX,XMM0 ;take low fp value and convert as integer in EAX
CVTTSD2SI EDX,XMM0 ;same as above with truncation
CVTSI2SD XMM4,EAX ;and back again into XMM4 (low)
;***** conversion between single-precision and integers ..
;***** watch these in XMM integer display switched to dword display
CVTPS2DQ XMM0,[SINGLEFP1] ;move 4 single-precision fp values to dwords as integers
CVTTPS2DQ XMM1,[SINGLEFP1] ;same as above with truncation
;***** and watch this in the SSE fp pane ..
CVTDQ2PS XMM6,XMM0 ;and convert back to 4 single-precision fp values
CVTDQ2PS XMM7,XMM1 ;ditto
RET
参考《XMM SSE2 floating point instructions 》 http://www.godevtool.com/TestbugHelp/XMMfpins2.htm