XMM SSE2浮点指令

SSE2 (单指令多数据流扩展)浮点指令使用128位的XMM寄存器,可以处理双精度(64位)浮点值。也有一些工作于单精度(32位)浮点值的指令。SSE2在Pentium 4 和 Xeon处理器中被提出。

这些指令跟SSE浮点指令非常类似,除了它们工作的数据长度不同。

在你的代码中使用这些指令之前,你必须检测你的机器是否支持它们。设置EAX=1,调用CPUID指令,此时测试EDX的第26位,如果为1则表示支持SSE2指令。

本文的测试程序都将使用以下的数据声明:

DOUBLEFP1 DQ 1.1

          DQ 3.3

DOUBLEFP2 DQ 20.66

          DQ 40.66

DOUBLEFPN DQ -5.1

          DQ +6.3

 

由于这些数据不能保证16位对齐,所以从内存到XMM寄存器传输数据必须使用MOVUPD指令。MOVUPD(移动两个未对齐的双精度值)不关心对齐。如果你在数据声明的时候指定了16字节对齐,那么就可以使用更快的MOVAPD(移动两个对齐的双精度值)指令。当在两个寄存器之间传输的时候,MOVUPD或者MOVAPD都可以使用。

我们在这里看到的指令往往可以分为两种类型,第一种指令一次处理两个64位浮点数,这些指令的名字里包含“PD”,指的是“packed double-precision”。第二种指令一次处理一个64位浮点数,这些指令的名字里包含“SD”,指的是“scalar double-precision”。 它们仅仅工作在XMM寄存器的低位部分,也就是说寄存器的64位(0-63)。

下面的测试程序,你可以给它们设置适当的断点,单步运行。你能看到在程序运行中XMM寄存器的改变。

SSE2指令

SSE2数据转移指令

这个测试程序演示在寄存器之间移动数据。MOVUPD和MOVAPD(对齐版本),MOVSD,MOVLPD和MOVHPD也能被使用在内存输入输出中获得数据。MOVMSKPD在比较指令后使用,可以把比较结果存入eax以便分析。

作为一个测试程序,我们也能尝试使用SSE整数指令MOVDQU和SSE浮点指令MOVUPS做这些事,后者看起来很像MOVUPD。它似乎只是位拷贝数据到XMM寄存器。然而,Intel警告反对这种不明确方式的使用指令,以防止未知的性能问题。

XMMSSE2_FPDATA:-

XMMSSE2_FPDATA:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h       ;test bit 26 (SSE2)
JNZ >L20                ;SSE2 available
CALL NOSSE2FPMESS       ;displays message if SSE2 not available
RET
L20:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1]      ;move two double precision fp values into XMM0
MOVAPD XMM7,XMM0             ;copying to XMM7
MOVSD  XMM2,[DOUBLEFP2]      ;move fp value to XMM1 low only
MOVLPD XMM3,[DOUBLEFP2]      ;this seems to be the same
MOVHPD XMM4,[DOUBLEFP2]      ;but this moves the high value
MOVUPD XMM0,[DOUBLEFPN]      ;move two new values, one is negative
MOVMSKPD EAX,XMM0            ;get both sign bits in XMM0 into eax
;************ and as an experiment, see if this does the same as MOVUPD ..
MOVDQU XMM1,[DOUBLEFPN]      ;use integer instruction to transfer the bits
;************ as this too (one byte smaller) ..
MOVUPS XMM2,[DOUBLEFPN]      ;use SSE instruction to transfer the bits
RET

 

SSE2数学运算指令

XMMSSE2_FPARITH:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h       ;test bit 26 (SSE2)
JNZ >L22                ;SSE2 available
CALL NOSSE2FPMESS       ;displays message if SSE2 not available
RET
L22:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0        ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1        ;copying to XMM3
ADDPD  XMM0,XMM1        ;add both fp values result in XMM0
MOVAPD XMM0,XMM2        ;restore value in XMM0
SUBPD  XMM0,XMM1        ;subtract both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
ADDSD  XMM0,XMM1        ;add low fp value result in XMM0
SUBSD  XMM0,XMM1        ;subtract low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
MULPD  XMM0,XMM1        ;multiply both fp values result in XMM0 
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
MULSD  XMM0,XMM1        ;multiply low fp value result in XMM0
;*******                
MOVAPD XMM0,XMM2        ;restore value in XMM0
DIVPD  XMM0,XMM1        ;divide both fp values result in XMM0 
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
DIVSD  XMM0,XMM1        ;divide low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
SQRTPD XMM0,XMM1        ;get square roots of both fp values result in XMM0 
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
SQRTSD XMM0,XMM1        ;get square root of low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
MAXPD XMM0,XMM1         ;get numerically greater fp values result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
MAXSD XMM0,XMM1         ;get numerically greater of low fp values result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
MINPD XMM0,XMM1         ;get numerically smaller fp values result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
MINSD XMM0,XMM1         ;get numerically smaller of low fp values result in XMM0
RET

 

SSE2逻辑运算指令

XMMSSE2_FPLOGIC:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h       ;test bit 26 (SSE2)
JNZ >L24                ;SSE2 available
CALL NOSSE2FPMESS       ;displays message if SSE2 not available
RET
L24:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0        ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1        ;copying to XMM3
ANDPD  XMM0,XMM1        ;perform AND on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
ANDNPD XMM0,XMM1        ;perform AND NOT on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
ORPD   XMM0,XMM1        ;perform OR on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2        ;restore value in XMM0
XORPD  XMM0,XMM1        ;perform XOR on both fp values result in XMM0
RET

 

SSE2比较指令

XMMSSE2_FPCOMP:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h       ;test bit 26 (SSE2)
JNZ >L26                ;SSE2 available
CALL NOSSE2FPMESS       ;displays message if SSE2 not available
RET
L26:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0        ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1        ;copying to XMM3
;********************* compare instructions working on both fp values
CMPPD XMM0,XMM1,0       ;=CMPEQPD see whether equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPPD XMM0,XMM1,1       ;=CMPLTPD see whether less than, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPPD XMM0,XMM1,2       ;=CMPLEPD see whether less than or equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPPD XMM0,XMM1,3       ;=CMPUNORDPD see unordered, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPPD XMM0,XMM1,4       ;=CMPNEQPD see whether not equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPPD XMM0,XMM1,5       ;=CMPNLTPD see whether not less than, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPPD XMM0,XMM1,6       ;=CMPNLEPD see whether not less than or equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPPD XMM0,XMM1,7       ;=CMPORDPD see whether ordered, result in XMM0
;********************* compare instructions working on low value only
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,0       ;=CMPEQPD see whether equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,1       ;=CMPLTPD see whether less than, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,2       ;=CMPLEPD see whether less than or equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,3       ;=CMPUNORDPD see unordered, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,4       ;=CMPNEQPD see whether not equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,5       ;=CMPNLTPD see whether not less than, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,6       ;=CMPNLEPD see whether not less than or equal, result in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
CMPSD XMM0,XMM1,7       ;=CMPORDPD see whether ordered, result in XMM0
;********************* compare and give result in eflags
MOVAPD XMM0,XMM2        ;restore original value to XMM0
COMISD XMM0,XMM1        ;look at lowest only result in eflags
UCOMISD XMM0,XMM1       ;(unordered compare)
MOVUPD XMM1,[DOUBLEFPN] ;move two -ve, two +ve values into XMM1
COMISD XMM0,XMM1        ;look at lowest only - result in eflags
UCOMISD XMM0,XMM1       ;(unordered compare)
RET

 

SSE2乱序与扩展指令

XMMSSE2_SHUFF:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h       ;test bit 26 (SSE2)
JNZ >L28                ;SSE2 available
CALL NOSSE2FPMESS       ;displays message if SSE2 not available
RET
L28:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0        ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1        ;copying to XMM3
SHUFPD XMM0,XMM1,3h     ;shuffle pack into destination
SHUFPD XMM0,XMM0,1h     ;swap the values in XMM0
MOVAPD XMM0,XMM2        ;restore original value to XMM0
UNPCKHPD XMM0,XMM1      ;unpack (high) and put into destination 
MOVAPD XMM0,XMM2        ;restore original value to XMM0
UNPCKLPD XMM0,XMM0      ;unpack (low) and put into destination 
RET

 

SSE2转换指令

XMMSSE2_CONV:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h       ;test bit 26 (SSE2)
JNZ >L30                ;SSE2 available
CALL NOSSE2FPMESS       ;displays message if SSE2 not available
RET
L30:
;***** display XMM registers in both SSE and SSE2 modes ..
;***** conversion between single and double-precision fp values ..
CVTPS2PD XMM0,[SINGLEFP1]  ;put single-precision fp values into XMM0 as double-precision
CVTPD2PS XMM6,XMM0         ;convert double precision to single precision in XMM7
CVTSS2SD XMM1,[SINGLEFP1]  ;as CVTPS2PD but working with only one value
CVTSD2SS XMM7,XMM1         ;as CVTSS2SD but working with only one value
;***** conversion between integers and double-precision fp values ..
;***** open the MMX integer pane for these tests ..
CVTPD2PI MM0,XMM0          ;convert fp values in XMM0 to integers in MM0
CVTTPD2PI MM1,XMM0         ;same as above with truncation
CVTPI2PD XMM0,[DINTEGER]   ;convert 23 and 24 to double-precision fp values
;***** open the XMM integer display and switch to dword display
CVTPD2DQ XMM7,XMM0         ;and convert 23 and 24 to dword integers into XMM7 (low)
CVTTPD2DQ XMM7,XMM0        ;same as above with truncation
CVTDQ2PD XMM3,XMM7         ;and back into fp values in XMM3
CVTSD2SI EAX,XMM0          ;take low fp value and convert as integer in EAX
CVTTSD2SI EDX,XMM0         ;same as above with truncation
CVTSI2SD XMM4,EAX          ;and back again into XMM4 (low)
;***** conversion between single-precision and integers ..
;***** watch these in XMM integer display switched to dword display
CVTPS2DQ XMM0,[SINGLEFP1]  ;move 4 single-precision fp values to dwords as integers
CVTTPS2DQ XMM1,[SINGLEFP1] ;same as above with truncation
;***** and watch this in the SSE fp pane ..
CVTDQ2PS XMM6,XMM0         ;and convert back to 4 single-precision fp values
CVTDQ2PS XMM7,XMM1         ;ditto
RET

参考《XMM SSE2 floating point instructions http://www.godevtool.com/TestbugHelp/XMMfpins2.htm

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值