c语言4x4矩形转置,最快的转置4x4字节矩阵的方法。

7

Let me rephrase your question: you're asking for a C- or C++-only solution that is portable. Then:

让我重新解释一下你的问题:你要求的是一个可移植的C或c++的解决方案。然后:

void transpose(uint32_t const in[4], uint32_t out[4]) {

// A B C D A E I M

// E F G H B F J N

// I J K L C G K O

// M N O P D H L P

out[0] = in[0] & 0xFF000000U; // A . . .

out[1] = in[1] & 0x00FF0000U; // . F . .

out[2] = in[2] & 0x0000FF00U; // . . K .

out[3] = in[3] & 0x000000FFU; // . . . P

out[1] |= (in[0] << 8) & 0xFF000000U; // B F . .

out[2] |= (in[0] << 16) & 0xFF000000U; // C . K .

out[3] |= (in[0] << 24); // D . . P

out[0] |= (in[1] >> 8) & 0x00FF0000U; // A E . .

out[2] |= (in[1] << 8) & 0x00FF0000U; // C G K .

out[3] |= (in[1] << 16) & 0x00FF0000U; // D H . P

out[0] |= (in[2] >> 16) & 0x0000FF00U; // A E I .

out[1] |= (in[2] >> 8) & 0x0000FF00U; // B F J .

out[3] |= (in[2] << 8) & 0x0000FF00U; // D H L P

out[0] |= (in[3] >> 24); // A E I M

out[1] |= (in[3] >> 8) & 0x000000FFU; // B F J N

out[2] |= (in[3] << 8) & 0x000000FFU; // C G K O

}

I don't see how it could be answered any other way, since then you'd be depending on a particular compiler compiling it in a particular way, etc.

我看不出它是怎么回答的,因为你会依赖于特定的编译器以特定的方式编译它,等等。

Of course if those manipulations themselves can be somehow simplified, it'd help. So that's the only avenue of further pursuit here. Nothing stands out so far, but then it's been a long day for me.

当然,如果这些操作本身可以被简化,它会有所帮助。所以这是唯一的进一步追求的途径。到目前为止,一切都还不明朗,但对我来说,这是漫长的一天。

So far, the cost is 12 shifts, 12 ORs, 16 ANDs. If the compiler and platform are any good, it can be done in 9 32 bit registers.

到目前为止,成本是12个班,12个,16个。如果编译器和平台是好的,可以在9 32位寄存器中完成。

If the compiler is very sad, or the platform doesn't have a barrel shifter, then some casting could help extol the fact that the shifts and masks are just byte extractions:

如果编译器很悲伤,或者平台没有一个桶移器,那么一些转换可以帮助说明转换和掩码只是字节提取的事实:

void transpose(uint8_t const in[16], uint8_t out[16]) {

// A B C D A E I M

// E F G H B F J N

// I J K L C G K O

// M N O P D H L P

out[0] = in[0]; // A . . .

out[1] = in[4]; // A E . .

out[2] = in[8]; // A E I .

out[3] = in[12]; // A E I M

out[4] = in[1]; // B . . .

out[5] = in[5]; // B F . .

out[6] = in[9]; // B F J .

out[7] = in[13]; // B F J N

out[8] = in[2]; // C . . .

out[9] = in[6]; // C G . .

out[10] = in[10]; // C G K .

out[11] = in[14]; // C G K O

out[12] = in[3]; // D . . .

out[13] = in[7]; // D H . .

out[14] = in[11]; // D H L .

out[15] = in[15]; // D H L P

}

If you really want to shuffle it in-place, then the following would do.

如果你真的想把它放在合适的位置,那么下面的就可以了。

void transpose(uint8_t m[16]) {

std::swap(m[1], m[4]);

std::swap(m[2], m[8]);

std::swap(m[3], m[12]);

std::swap(m[6], m[9]);

std::swap(m[7], m[13]);

std::swap(m[11], m[14]);

}

The byte-oriented versions may well produce worse code on modern platforms. Only a benchmark can tell.

面向字节的版本可能会在现代平台上产生更糟糕的代码。只有一个基准可以说明。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值