AMD MMX Extensions

最新推荐文章于 2022-03-17 17:55:02 发布

twnming

最新推荐文章于 2022-03-17 17:55:02 发布

阅读量563

点赞数

分类专栏： SIMD Programing 文章标签： permutation cache integer byte parameters each

Programing 同时被 2 个专栏收录

33 篇文章 0 订阅

订阅专栏

SIMD

14 篇文章 0 订阅

订阅专栏

MMX Extensions Overview

With the release of the Athlon in 1999, AMD extended the standard MMX instruction set with some new instructions. These instructions provide some enhanced conversion and selection instructions, as well as some advanced cache management instructions. Many of these extensions later found their way into SSE.

During the development of SSE, it was somtimes called MMX2. This is incorrect. To further complicate matters, in 2005 Intel announced another SIMD technology named MMX2 for their XScale processor line, which should actually be called WMMX2.

MMX Extensions — Cache Management

Similar to the prefetch and prefetchw instructions, programmers are now given control over which cache level they want the data to be loaded into. This allows a program to only fetch data to the external caches, or all the way into the microprocessor cache.

prefetchnta fetches data into the CPU without using L1 and L2 cache.

prefetcht0 fetches data to all cache levels.

prefetcht1 fetches data to L1 and L2 cache.

prefetcht2 fetches data to just the L2 cache.

We are also provided with an instruction that controls write ordering, which is useful for multi-processor operation.

sfence make all previous writes global. This will force a new write to wait for other writes to complete before executing.

MMX Extensions — Data Movement

These extensions also provide a number of data-moving instructions. These include some configurable data-shuffling instructions, masked moves, and word insertion/extraction.

movntq is a non-temporal write. This means that it bypasses the cache, keeping its contents unchanged. This instruction can only be used to write to memory from a register.

maskmovq is a conditional move. This instrction also bypasses the cache, like movntq. It uses edi to point to a destionation address, and moves bytes from an MMX register to the edi memory location based upon the top bits of each byte of another MMX register. (this really needs an example).

pmovmskb moves to top bits of each byte of an MMX register into the bottom 8 bits of a regular 32-bit register.

pextrw extracts a selected word from an MMX register into the bottom half of a 32-bit register. Selection is performed by an 8-bit value, of which the bottom 2 bits are used to select which of the 4 words in an MMX register to extract. The top 16 bits of the 32-bit register are set to zero.

pinsrw is just like pextrw, except it moves data from the bottom of a 32-bit register or 16-bit memory location into a selected word of an MMX register. None of the other words in the destionation MMX register are changed.

pshufw is a completely crazy instruction that allows a programmer to shuffle data between 2 MMX registers in 1 of 256 possible ways. It takes 3 parameters; the two MMX registers and an 8-bit permutation value. It uses the permutation byte 2 bits at a time to determine where to shuffle the data from. This needs an example.

MMX Extensions — Integer Operations

AMD's extentions provide several new integer operations as well.

pavgb stores the rounded-up averages of an MMX register and another register or memory location. This instruction is identical to the 3DNow! pavgusb instruction. It operates on bytes, and treats them as unsigned.

pavgw is just like pavgb, except it operates on 16-bit words instead of 8-bit bytes. These are also treated a sunsigned values.

pmaxsw loads a register with the maximum value between that register and another register or memory location. It operates on 16-bit words, and treats them as signed values.

pmaxub is similar to pmaxsw, except it operates on 8-bit bytes instead of words, and threats the values as unsigned.

pminsw loads a register with the minimum value between that register and another register or memory location. Like pmaxsw, it operates on 16-bit signed words.

pminub is just like pminsw except it operates on 8-bit unsigned bytes instead of 16-bit signed words.

pmulhuw multiplies 4 16-bit unsigned words in a register with 4 more in another register or memory location, and stores the top 16 bits of each 32-bit result.

psadbw calculates the sum of absolute differences. What this means is that it calculates the byte difference between a register and another register or memory location, and sums the absolute value of all 8 differences. The result goes into the bottom 16 bits of the first register, and the top 48 bits are set to zero.

Trademark Information

MMX is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

Athlon is a registered trademark of Advanced Micro Devices, Inc