SSE2 — OpCode List
(under construction -- this list might be incomplete?!)
Additionally, with AMD64's 64/128 bit register extensions some of the functionality changes...
Arithmetic:
addpd
- Adds 2 64bit doubles.
addsd
- Adds bottom 64bit doubles.
subpd
- Subtracts 2 64bit doubles.
subsd
- Subtracts bottom 64bit doubles.
mulpd
- Multiplies 2 64bit doubles.
mulsd
- Multiplies bottom 64bit doubles.
divpd
- Divides 2 64bit doubles.
divsd
- Divides bottom 64bit doubles.
maxpd
- Gets largest of 2 64bit doubles for 2 sets.
maxsd
- Gets largets of 2 64bit doubles to bottom set.
minpd
- Gets smallest of 2 64bit doubles for 2 sets.
minsd
- Gets smallest of 2 64bit values for bottom set.
paddb
- Adds 16 8bit integers.
paddw
- Adds 8 16bit integers.
paddd
- Adds 4 32bit integers.
paddq
- Adds 2 64bit integers.
paddsb
- Adds 16 8bit integers with saturation.
paddsw
- Adds 8 16bit integers using saturation.
paddusb
- Adds 16 8bit unsigned integers using saturation.
paddusw
- Adds 8 16bit unsigned integers using saturation.
psubb
- Subtracts 16 8bit integers.
psubw
- Subtracts 8 16bit integers.
psubd
- Subtracts 4 32bit integers.
psubq
- Subtracts 2 64bit integers.
psubsb
- Subtracts 16 8bit integers using saturation.
psubsw
- Subtracts 8 16bit integers using saturation.
psubusb
- Subtracts 16 8bit unsigned integers using saturation.
psubusw
- Subtracts 8 16bit unsigned integers using saturation.
pmaddwd
- Multiplies 16bit integers into 32bit results and adds results.
pmulhw
- Multiplies 16bit integers and returns the high 16bits of the result.
pmullw
- Multiplies 16bit integers and returns the low 16bits of the result.
pmuludq
- Multiplies 2 32bit pairs and stores 2 64bit results.
rcpps
- Approximates the reciprocal of 4 32bit singles.
rcpss
- Approximates the reciprocal of bottom 32bit single.
sqrtpd
- Returns square root of 2 64bit doubles.
sqrtsd
- Returns square root of bottom 64bit double.
Logic:
andnpd
- Logically NOT ANDs 2 64bit doubles.
andnps
- Logically NOT ANDs 4 32bit singles.
andpd
- Logically ANDs 2 64bit doubles.
pand
- Logically ANDs 2 128bit registers.
pandn
- Logically Inverts the first 128bit operand and ANDs with the second.
por
- Logically ORs 2 128bit registers.
pslldq
- Logically left shifts 1 128bit value.
psllq
- Logically left shifts 2 64bit values.
pslld
- Logically left shifts 4 32bit values.
psllw
- Logically left shifts 8 16bit values.
psrad
- Arithmetically right shifts 4 32bit values.
psraw
- Arithmetically right shifts 8 16bit values.
psrldq
- Logically right shifts 1 128bit values.
psrlq
- Logically right shifts 2 64bit values.
psrld
- Logically right shifts 4 32bit values.
psrlw
- Logically right shifts 8 16bit values.
pxor
- Logically XORs 2 128bit registers.
orpd
- Logically ORs 2 64bit doubles.
xorpd
- Logically XORs 2 64bit doubles.
Compare:
cmppd
- Compares 2 pairs of 64bit doubles.
cmpsd
- Compares bottom 64bit doubles.
comisd
- Compares bottom 64bit doubles and stores result in
EFLAGS
.
ucomisd
- Compares bottom 64bit doubles and stores result in
EFLAGS
. (
QNaNs don't throw exceptions with
ucomisd
, unlike
comisd
.
pcmpxxb
- Compares 16 8bit integers.
pcmpxxw
- Compares 8 16bit integers.
pcmpxxd
- Compares 4 32bit integers.
Compare Codes (the
xx
parts above):
eq
- Equal to.
lt
- Less than.
le
- Less than or equal to.
ne
- Not equal.
nlt
- Not less than.
nle
- Not less than or equal to.
ord
- Ordered.
unord
- Unordered.
Conversion:
cvtdq2pd
- Converts 2 32bit integers into 2 64bit doubles.
cvtdq2ps
- Converts 4 32bit integers into 4 32bit singles.
cvtpd2pi
- Converts 2 64bit doubles into 2 32bit integers in an
MMX
register.
cvtpd2dq
- Converts 2 64bit doubles into 2 32bit integers in the bottom of an
XMM
register.
cvtpd2ps
- Converts 2 64bit doubles into 2 32bit singles in the bottom of an
XMM
register.
cvtpi2pd
- Converts 2 32bit integers into 2 32bit singles in the bottom of an
XMM
register.
cvtps2dq
- Converts 4 32bit singles into 4 32bit integers.
cvtps2pd
- Converts 2 32bit singles into 2 64bit doubles.
cvtsd2si
- Converts 1 64bit double to a 32bit integer in a
GPR.
cvtsd2ss
- Converts bottom 64bit double to a bottom 32bit single. Tops are unchanged.
cvtsi2sd
- Converts a 32bit integer to the bottom 64bit double.
cvtsi2ss
- Converts a 32bit integer to the bottom 32bit single.
cvtss2sd
- Converts bottom 32bit single to bottom 64bit double.
cvtss2si
- Converts bottom 32bit single to a 32bit integer in a
GPR
.
cvttpd2pi
- Converts 2 64bit doubles to 2 32bit integers using truncation into an
MMX
register.
cvttpd2dq
- Converts 2 64bit doubles to 2 32bit integers using truncation.
cvttps2dq
- Converts 4 32bit singles to 4 32bit integers using truncation.
cvttps2pi
- Converts 2 32bit singles to 2 32bit integers using truncation into an
MMX
register.
cvttsd2si
- Converts a 64bit double to a 32bit integer using truncation into a
GPR
.
cvttss2si
- Converts a 32bit single to a 32bit integer using truncation into a
GPR
.
Load/Store:
(is "minimize cache pollution" the same as "without using cache"??)
movq
- Moves a 64bit value, clearing the top 64bits of an
XMM
register.
movsd
- Moves a 64bit double, leaving tops unchanged if move is between two
XMM
registers.
movapd
- Moves 2 aligned 64bit doubles.
movupd
- Moves 2 unaligned 64bit doubles.
movhpd
- Moves top 64bit value to or from an
XMM
register.
movlpd
- Moves bottom 64bit value to or from an
XMM
register.
movdq2q
- Moves bottom 64bit value into an
MMX
register.
movq2dq
- Moves an
MMX
register value to the bottom of an
XMM
register. Top is cleared to zero.
movntpd
- Moves a 128bit value to memory without using the cache. NT is "Non Temporal."
movntdq
- Moves a 128bit value to memory without using the cache.
movnti
- Moves a 32bit value without using the cache.
maskmovdqu
- Moves 16 bytes based on sign bits of another
XMM
register.
pmovmskb
- Generates a 16bit Mask from the sign bits of each byte in an
XMM
register.
Shuffling:
pshufd
- Shuffles 32bit values in a complex way.
pshufhw
- Shuffles high 16bit values in a complex way.
pshuflw
- Shuffles low 16bit values in a complex way.
unpckhpd
- Unpacks and interleaves top 64bit doubles from 2 128bit sources into 1.
unpcklpd
- Unpacks and interleaves bottom 64bit doubles from 2 128 bit sources into 1.
punpckhbw
- Unpacks and interleaves top 8 8bit integers from 2 128bit sources into 1.
punpckhwd
- Unpacks and interleaves top 4 16bit integers from 2 128bit sources into 1.
punpckhdq
- Unpacks and interleaves top 2 32bit integers from 2 128bit sources into 1.
punpckhqdq
- Unpacks and interleaces top 64bit integers from 2 128bit sources into 1.
punpcklbw
- Unpacks and interleaves bottom 8 8bit integers from 2 128bit sources into 1.
punpcklwd
- Unpacks and interleaves bottom 4 16bit integers from 2 128bit sources into 1.
punpckldq
- Unpacks and interleaves bottom 2 32bit integers from 2 128bit sources into 1.
punpcklqdq
- Unpacks and interleaces bottom 64bit integers from 2 128bit sources into 1.
packssdw
- Packs 32bit integers to 16bit integers using saturation.
packsswb
- Packs 16bit integers to 8bit integers using saturation.
packuswb
- Packs 16bit integers to 8bit unsigned integers unsing saturation.
Cache Control:
clflush
- Flushes a Cache Line from all levels of cache.
lfence
- Guarantees that all memory loads issued before the
lfence
instruction are completed before anyloads after the
lfence
instruction.
mfence
- Guarantees that all memory reads and writes issued before the
mfence
instruction are completed before any reads or writes after the
mfence
instruction.
pause
- Pauses execution for a set amount of time.