MMX Extensions Overview
With the release of the
Athlon in 1999,
AMD extended the standard
MMX instruction set with some new instructions. These instructions provide some enhanced conversion and selection instructions, as well as some advanced cache management instructions. Many of these extensions later found their way into
SSE.
During the development of
SSE, it was somtimes called MMX2. This is incorrect. To further complicate matters, in 2005
Intel
announced another SIMD technology named MMX2 for their
XScale processor line, which should actually be called
WMMX2.
MMX Extensions — Cache Management
Similar to the
prefetch
and
prefetchw
instructions, programmers are now given control over which cache level they want the data to be loaded into. This allows a program to only fetch data to the external caches, or all the way into the microprocessor cache.
prefetchnta
fetches data into the
CPU without using L1 and L2 cache.
prefetcht0
fetches data to all cache levels.
prefetcht1
fetches data to L1 and L2 cache.
prefetcht2
fetches data to just the L2 cache.
We are also provided with an instruction that controls write ordering, which is useful for multi-processor operation.
sfence
make all previous writes global. This will force a new write to wait for other writes to complete before executing.
MMX Extensions — Data Movement
These extensions also provide a number of data-moving instructions. These include some configurable data-shuffling instructions, masked moves, and word insertion/extraction.
movntq
is a non-temporal write. This means that it bypasses the cache, keeping its contents unchanged. This instruction can only be used to write to memory from a register.
maskmovq
is a conditional move. This instrction also bypasses the cache, like
movntq
. It uses
edi
to point to a destionation address, and moves bytes from an MMX register to the
edi
memory location based upon the top bits of each byte of another MMX register. (this really needs an example).
pmovmskb
moves to top bits of each byte of an MMX register into the bottom 8 bits of a regular 32-bit register.
pextrw
extracts a selected word from an MMX register into the bottom half of a 32-bit register. Selection is performed by an 8-bit value, of which the bottom 2 bits are used to select which of the 4 words in an MMX register to extract. The top 16 bits of the 32-bit register are set to zero.
pinsrw
is just like
pextrw
, except it moves data from the bottom of a 32-bit register or 16-bit memory location into a selected word of an MMX register. None of the other words in the destionation MMX register are changed.
pshufw
is a completely crazy instruction that allows a programmer to shuffle data between 2 MMX registers in 1 of 256 possible ways. It takes 3 parameters; the two MMX registers and an 8-bit permutation value. It uses the permutation byte 2 bits at a time to determine where to shuffle the data from. This needs an example.
MMX Extensions — Integer Operations
AMD's extentions provide several new integer operations as well.
pavgb
stores the rounded-up averages of an MMX register and another register or memory location. This instruction is identical to the 3DNow!
pavgusb
instruction. It operates on bytes, and treats them as unsigned.
pavgw
is just like
pavgb
, except it operates on 16-bit words instead of 8-bit bytes. These are also treated a sunsigned values.
pmaxsw
loads a register with the maximum value between that register and another register or memory location. It operates on 16-bit words, and treats them as signed values.
pmaxub
is similar to
pmaxsw
, except it operates on 8-bit bytes instead of words, and threats the values as unsigned.
pminsw
loads a register with the minimum value between that register and another register or memory location. Like
pmaxsw
, it operates on 16-bit signed words.
pminub
is just like
pminsw
except it operates on 8-bit unsigned bytes instead of 16-bit signed words.
pmulhuw
multiplies 4 16-bit unsigned words in a register with 4 more in another register or memory location, and stores the top 16 bits of each 32-bit result.
psadbw
calculates the sum of absolute differences. What this means is that it calculates the byte difference between a register and another register or memory location, and sums the absolute value of all 8 differences. The result goes into the bottom 16 bits of the first register, and the top 48 bits are set to zero.
Trademark Information
MMX is a registered trademark of
Intel Corporation or its subsidiaries in the United States and other countries.
Athlon is a registered trademark of
Advanced Micro Devices, Inc