Assembly x64 Intro - MMX 4x4W Transpose






; pOut mm1, mm4, mm5, mm3
%macro MMX_Trans4x4W 5
    MMX_XSwap wd, %1, %2, %5
    MMX_XSwap wd, %3, %4, %2
    MMX_XSwap dq, %1, %3, %4
    MMX_XSwap dq, %5, %2, %3
%endmacro



%macro MMX_XSwap  4
    movq        %4, %2
    punpckh%1   %4, %3
    punpckl%1   %2, %3
%endmacro


MMX_Trans4x4W        mm0, mm1, mm2, mm3, mm4


Here, assume, mm0 = a3a2a1a0, mm1 = b3b2b1b0, mm2 = c3c2c1c0, mm3 = d3d2d1d0          (4个16bit)

MMX_Trans4x4W        mm0, mm1, mm2, mm3, mm4 将使得

mm0 = d0c0b0a0, mm3 = d1c1b1a1, mm4 = d2c2b2a2,  mm2 = d3c3b3a3


上述展开如下:

        MMX_XSwap wd, mm0, mm1, mm4 =>

                movq             mm4, mm0    => mm4 = mm0 = a3a2a1a0

                punpckhwd   mm4, mm1    =>  mm4 = b3a3b2a2

                punpcklwd    mm0, mm1    =>   mm0 = b1a1b0a0

        MMX_XSwap wd, mm2, mm3, mm1 =>

                movq             mm1, mm2  => mm1 = mm2 = c3c2c1c0

                punpckhwd   mm1, mm3  => mm1 = d3c3d2c2

                punpcklwd    mm2, mm3  => mm2 = d1c1d0c0

       MMX_XSwap dq, mm0, mm2, mm3 =>

                movq             mm3, mm0   => mm3 = mm0 = b1a1b0a0

                punpckhdq   mm3, mm2    => mm3 = d1c1b1a1

                punpckldq    mm0, mm2    => mm0 = d0c0b0a0 

       MMX_XSwap dq, mm4, mm1, mm2 =>

                movq             mm2, mm4  => mm2 = mm4 = b3a3b2a2

                punpckhdq    mm2, mm1  => mm2 = d3c3b3a3

                punpckldq     mm4, mm1  =>  mm4 = d2c2b2a2

=> mm0 = d0c0b0a0,  mm3 = d1c1b1a1, mm4 = d2c2b2a2, mm2 = d3c3b3a3






评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值