使用单个乘法提取位

最新推荐文章于 2022-02-23 15:02:25 发布

p15097962069

最新推荐文章于 2022-02-23 15:02:25 发布

阅读量367

点赞数

文章标签： c multiplication bit-manipulation

原文链接：https://oldbug.net/q/z2MR/Extracting-bits-with-a-single-multiplication

版权

本文翻译自：Extracting bits with a single multiplication

I saw an interesting technique used in an answer to another question , and would like to understand it a little better. 我在另一个问题的答案中看到了一种有趣的技巧，并希望能够更好地理解它。

We're given an unsigned 64-bit integer, and we are interested in the following bits: 我们给出了一个无符号的64位整数，我们对以下几位感兴趣：

1.......2.......3.......4.......5.......6.......7.......8.......

Specifically, we'd like to move them to the top eight positions, like so: 具体来说，我们希望将它们移到前八位，如下所示：

12345678........................................................

We don't care about the value of the bits indicated by . 我们不关心指示的位的值. , and they don't have to be preserved. ，它们不必保存。

The solution was to mask out the unwanted bits, and multiply the result by 0x2040810204081 . 解决方案是屏蔽掉不需要的位，并将结果乘以0x2040810204081 。 This, as it turns out, does the trick. 事实证明，这就是诀窍。

How general is this method? 这种方法有多普遍？ Can this technique be used to extract any subset of bits? 这种技术可以用来提取任何比特子集吗？ If not, how does one figure out whether or not the method works for a particular set of bits? 如果不是，如何判断该方法是否适用于特定的位组？

Finally, how would one go about finding the (a?) correct multiplier to extract the given bits? 最后，如何找到（a？）正确的乘数来提取给定的位？

#1楼

参考：https://stackoom.com/question/z2MR/使用单个乘法提取位

#2楼

Very interesting question, and clever trick. 非常有趣的问题，聪明的伎俩。

Let's look at a simple example of getting a single byte manipulated. 让我们看一个操作单个字节的简单示例。 Using unsigned 8 bit for simplicity. 使用无符号8位简化。 Imagine your number is xxaxxbxx and you want ab000000 . 想象一下你的号码是xxaxxbxx ，你想要ab000000 。

The solution consisted of two steps: a bit masking, followed by multiplication. 解决方案包括两个步骤：一个掩码，然后乘法。 The bit mask is a simple AND operation that turns uninteresting bits to zeros. 位掩码是一种简单的AND操作，可将无趣的位转换为零。 In the above case, your mask would be 00100100 and the result 00a00b00 . 在上面的例子中，你的面具将是00100100 ，结果是00a00b00 。

Now the hard part: turning that into ab...... . 现在困难的部分：把它变成ab......

A multiplication is a bunch of shift-and-add operations. 乘法是一堆移位和加法运算。 The key is to allow overflow to "shift away" the bits we don't need and put the ones we want in the right place. 关键是允许溢出“移开”我们不需要的位并将我们想要的位置放在正确的位置。

Multiplication by 4 ( 00000100 ) would shift everything left by 2 and get you to a00b0000 . 乘以4（ 00000100 ）会将剩下的所有内容a00b0000 2，然后转到a00b0000 。 To get the b to move up we need to multiply by 1 (to keep the a in the right place) + 4 (to move the b up). 为了让b向上移动，我们需要乘以1（将a保持在正确的位置）+4（将b向上移动）。 This sum is 5, and combined with the earlier 4 we get a magic number of 20, or 00010100 . 这个总和是5，并且与之前的4相结合，我们获得了20或00010100的幻数。 The original was 00a00b00 after masking; 掩蔽后原来是00a00b00 ; the multiplication gives: 乘法给出：

000000a00b000000
00000000a00b0000 +
----------------
000000a0ab0b0000
xxxxxxxxab......

From this approach you can extend to larger numbers and more bits. 通过这种方法，您可以扩展到更大的数字和更多的位。

One of the questions you asked was "can this be done with any number of bits?" 你问过的一个问题是“这可以用任意数量的位完成吗？” I think the answer is "no", unless you allow several masking operations, or several multiplications. 我认为答案是“不”，除非您允许多次屏蔽操作或多次乘法。 The problem is the issue of "collisions" - for example, the "stray b" in the problem above. 问题是“碰撞”问题 - 例如，上述问题中的“流浪b”。 Imagine we need to do this to a number like xaxxbxxcx . 想象一下，我们需要对像xaxxbxxcx这样的数字执行此xaxxbxxcx 。 Following the earlier approach, you would think we need {x 2, x {1 + 4 + 16}} = x 42 (oooh - the answer to everything!). 按照早期的方法，你会认为我们需要{x 2，x {1 + 4 + 16}} = x 42（噢 - 一切的答案！）。 Result: 结果：

00000000a00b00c00
000000a00b00c0000
0000a00b00c000000
-----------------
0000a0ababcbc0c00
xxxxxxxxabc......

As you can see, it still works, but "only just". 正如你所看到的，它仍然有效，但“只是”。 They key here is that there is "enough space" between the bits we want that we can squeeze everything up. 它们的关键在于我们想要的位之间有足够的空间，我们可以挤压一切。 I could not add a fourth bit d right after c, because I would get instances where I get c+d, bits might carry, ... 我无法在c之后添加第四位，因为我会得到实例，我得到c + d，位可能携带，...

So without formal proof, I would answer the more interesting parts of your question as follows: "No, this will not work for any number of bits. To extract N bits, you need (N-1) spaces between the bits you want to extract, or have additional mask-multiply steps." 因此，如果没有正式的证据，我会回答你问题中更有趣的部分如下：“不，这对任意数量的位都不起作用。要提取N位，你需要（N-1）个空格你想要的位数提取，或者有额外的掩码倍增步骤。“

The only exception I can think of for the "must have (N-1) zeros between bits" rule is this: if you want to extract two bits that are adjacent to each other in the original, AND you want to keep them in the same order, then you can still do it. 我能想到的唯一例外是“必须在位之间有（N-1）个零”规则是这样的：如果你想提取原始中彼此相邻的两个位，并且你希望将它们保留在同样的顺序，那么你仍然可以做到这一点。 And for the purpose of the (N-1) rule they count as two bits. 并且出于（N-1）规则的目的，它们被计为两位。

There is another insight - inspired by the answer of @Ternary below (see my comment there). 还有另一种见解 - 受以下@Ternary答案的启发（见我在那里的评论）。 For each interesting bit, you only need as many zeros to the right of it as you need space for bits that need to go there. 对于每个有趣的位，您只需要在其右侧有多个零，因为您需要空间来存在需要的位。 But also, it needs as many bits to the left as it has result-bits to the left. 但是，它左边需要尽可能多的位，因为左边有结果位。 So if a bit b ends up in position m of n, then it needs to have m-1 zeros to its left, and nm zeros to its right. 因此，如果位b在n的位置m结束，则它需要在其左侧具有m-1个零，并且在其右侧具有nm零。 Especially when the bits are not in the same order in the original number as they will be after the re-ordering, this is an important improvement to the original criteria. 特别是当比特在原始数字中的顺序与重新排序后的顺序不同时，这是对原始标准的重要改进。 This means, for example, that a 16 bit word 这意味着，例如，一个16位字

a...e.b...d..c..

Can be shifted into 可以转入

abcde...........

even though there is only one space between e and b, two between d and c, three between the others. 即使e和b之间只有一个空格，d和c之间只有两个，其他三个之间。 Whatever happened to N-1?? 无论N-1发生了什么？ In this case, a...e becomes "one block" - they are multiplied by 1 to end up in the right place, and so "we got e for free". 在这种情况下， a...e变为“一个块” - 它们乘以1以最终在正确的位置，因此“我们免费得到e”。 The same is true for b and d (b needs three spaces to the right, d needs the same three to its left). 对于b和d也是如此（b需要右边三个空格，d需要左边相同的三个空格）。 So when we compute the magic number, we find there are duplicates: 因此，当我们计算幻数时，我们发现存在重复：

a: << 0  ( x 1    )
b: << 5  ( x 32   )
c: << 11 ( x 2048 )
d: << 5  ( x 32   )  !! duplicate
e: << 0  ( x 1    )  !! duplicate

Clearly, if you wanted these numbers in a different order, you would have to space them further. 显然，如果您希望这些数字的顺序不同，则必须进一步区分它们。 We can reformulate the (N-1) rule: "It will always work if there are at least (N-1) spaces between bits; or, if the order of bits in the final result is known, then if a bit b ends up in position m of n, it needs to have m-1 zeros to its left, and nm zeros to its right." 我们可以重新构造(N-1)规则：“如果在位之间至少有（N-1）个空格，它将始终有效;或者，如果最终结果中的位的顺序是已知的，那么如果位b结束在n的位置m，它需要在其左边有m-1个零，并且在它的右边有零。“

@Ternary pointed out that this rule doesn't quite work, as there can be a carry from bits adding "just to the right of the target area" - namely, when the bits we're looking for are all ones. @Ternary指出这个规则并不是很有效，因为可以有一个位来自“添加到目标区域右侧”的位 - 即，当我们要查找的位都是1时。 Continuing the example I gave above with the five tightly packed bits in a 16 bit word: if we start with 继续我在上面给出了一个16位字中的五个紧密位的例子：如果我们开始

a...e.b...d..c..

For simplicity, I will name the bit positions ABCDEFGHIJKLMNOP 为简单起见，我将命名位置ABCDEFGHIJKLMNOP

The math we were going to do was 我们要做的数学是

ABCDEFGHIJKLMNOP

a000e0b000d00c00
0b000d00c0000000
000d00c000000000
00c0000000000000 +
----------------
abcded(b+c)0c0d00c00

Until now, we thought anything below abcde (positions ABCDE ) would not matter, but in fact, as @Ternary pointed out, if b=1, c=1, d=1 then (b+c) in position G will cause a bit to carry to position F , which means that (d+1) in position F will carry a bit into E - and our result is spoilt. 到目前为止，我们认为任何低于abcde （位置ABCDE ）都无关紧要，但事实上，正如@Ternary指出的那样，如果b=1, c=1, d=1则位置G (b+c)将导致a位进行定位F ，这意味着(d+1)在适当的位置F将携带位插入E -和我们的结果是损坏的。 Note that space to the right of the least significant bit of interest ( c in this example) doesn't matter, since the multiplication will cause padding with zeros from beyone the least significant bit. 请注意，感兴趣的最低有效位（本例中为c ）右侧的空格无关紧要，因为乘法将导致填充零，从最低有效位开始。

So we need to modify our (m-1)/(nm) rule. 所以我们需要修改我们的（m-1）/（nm）规则。 If there is more than one bit that has "exactly (nm) unused bits to the right (not counting the last bit in the pattern - "c" in the example above), then we need to strengthen the rule - and we have to do so iteratively! 如果有多个位具有“正好（nm）未使用的位向右（不计算模式中的最后一位 - 上例中的”c“），那么我们需要加强规则 - 我们必须迭代地这样做！

We have to look not only at the number of bits that meet the (nm) criterion, but also the ones that are at (n-m+1), etc. Let's call their number Q0 (exactly nm to next bit), Q1 (n-m+1), up to Q(N-1) (n-1). 我们不仅要查看符合（nm）标准的位数，还要查看（n-m + 1）处的位数。让我们将它们的数字称为Q0（正好是nm到下一位），Q1 （n-m + 1），直到Q（N-1）（n-1）。 Then we risk carry if 那么我们冒险携带if

Q0 > 1
Q0 == 1 && Q1 >= 2
Q0 == 0 && Q1 >= 4
Q0 == 1 && Q1 > 1 && Q2 >=2
...

If you look at this, you can see that if you write a simple mathematical expression 如果你看一下这个，你可以看到，如果你写一个简单的数学表达式

W = N * Q0 + (N - 1) * Q1 + ... + Q(N-1)

and the result is W > 2 * N , then you need to increase the RHS criterion by one bit to (n-m+1) . 结果是W > 2 * N ，那么你需要将RHS标准增加一位到(n-m+1) 。 At this point, the operation is safe as long as W < 4 ; 此时，只要W < 4 ，操作就是安全的。 if that doesn't work, increase the criterion one more, etc. 如果这不起作用，再增加一个标准，等等。

I think that following the above will get you a long way to your answer... 我认为按照上述内容将为您提供很长的答案...

#3楼

Every 1-bit in the multiplier is used to copy one of the bits into its correct position: 乘法器中的每1位用于将其中一个位复制到正确的位置：

1 is already in the correct position, so multiply by 0x0000000000000001 . 1已经处于正确的位置，因此乘以0x0000000000000001 。
2 must be shifted 7 bit positions to the left, so we multiply by 0x0000000000000080 (bit 7 is set). 2必须向左移7位位置，所以我们乘以0x0000000000000080 （第7位置位）。
3 must be shifted 14 bit positions to the left, so we multiply by 0x0000000000000400 (bit 14 is set). 3必须向左移位14位，所以我们乘以0x0000000000000400 （第14位置位）。
and so on until 等等，直到
8 must be shifted 49 bit positions to the left, so we multiply by 0x0002000000000000 (bit 49 is set). 8必须向左移动49位，所以我们乘以0x0002000000000000 （设置位49）。

The multiplier is the sum of the multipliers for the individual bits. 乘数是各个位的乘数之和。

This only works because the bits to be collected are not too close together, so that the multiplication of bits which do not belong together in our scheme either fall beyond the 64 bit or in the lower don't-care part. 这只能起作用，因为要收集的位不是太靠近，因此在我们的方案中不属于一起的位的乘法要么超出64位，要么落在较低的无关注部分。

Note that the other bits in the original number must be 0 . 请注意，原始编号中的其他位必须为0 。 This can be achieved by masking them with an AND operation. 这可以通过使用AND操作屏蔽它们来实现。

#4楼

(I'd never seen it before. This trick is great!) （我以前从未见过它。这个技巧很棒！）

I'll expand a bit on Floris's assertion that when extracting n bits you need n-1 space between any non-consecutive bits: 我将对Floris的断言进行一些扩展，即当提取n位时，任何非连续位之间需要n-1空格：

My initial thought (we'll see in a minute how this doesn't quite work) was that you could do better: If you want to extract n bits, you'll have a collision when extracting/shifting bit i if you have anyone (non-consecutive with bit i ) in the i-1 bits preceding or ni bits subsequent. 我最初的想法（我们将在一分钟内如何完全不是那么回事看）是，你可以做的更好：如果你想提取n位，解压时你就会有一个碰撞/换挡位i如果有任何人（在比特i不连续）在前面的i-1比特或后续的ni比特中。

I'll give a few examples to illustrate: 我将举几个例子来说明：

...a..b...c... Works (nobody in the 2 bits after a , the bit before and the bit after b , and nobody is in the 2 bits before c ): ...a..b...c...工作（ a之前的2位， b之前的位和b之后的位，没有人在c之前的2位）：

  a00b000c
+ 0b000c00
+ 00c00000
= abc.....

...ab...c... Fails because b is in the 2 bits after a (and gets pulled into someone else's spot when we shift a ): ...ab...c...失败，因为b是在2位后a （当我们转向被拉到别人的斑a ）：

  a0b0000c
+ 0b0000c0
+ 00c00000
= abX.....

...a...bc.. Fails because b is in the 2 bits preceding c (and gets pushed into someone else's spot when we shift c ): ...a...bc..失败，因为b在c之前的2位（当我们移动c时被推入别人的位置）：

  a000b0c0
+ 0b0c0000
+ b0c00000
= Xbc.....

...a...bc...d... Works because consecutive bits shift together: ...a...bc...d...因连续位移位而起作用：

  a000bc000d
+ 0bc000d000
+ 000d000000
= abcd000000

But we have a problem. 但是我们遇到了问题。 If we use ni instead of n-1 we could have the following scenario: what if we have a collision outside of the part that we care about, something we would mask away at the end, but whose carry bits end up interfering in the important un-masked range? 如果我们使用ni而不是n-1我们可能会遇到以下情况：如果我们在我们关心的部分之外发生碰撞，那么我们会在结束时掩盖，但是其进位最终会干扰重要部分未掩盖范围？ (and note: the n-1 requirement makes sure this doesn't happen by making sure the i-1 bits after our un-masked range are clear when we shift the the i th bit) （注意： n-1要求确保在我们移位第i位时确保未屏蔽范围之后的i-1位清零）

...a...b..c...d... Potential failure on carry-bits, c is in n-1 after b , but satisfies ni criteria: ...a...b..c...d...进位位的潜在故障， c在b后的n-1 ，但满足ni标准：

  a000b00c000d
+ 0b00c000d000
+ 00c000d00000
+ 000d00000000
= abcdX.......

So why don't we just go back to that " n-1 bits of space" requirement? 那么为什么我们不回到那个“ n-1位空间”的要求呢？ Because we can do better : 因为我们可以做得更好 ：

...a....b..c...d.. Fails the " n-1 bits of space" test, but works for our bit-extracting trick: ...a....b..c...d.. 失败了“ n-1位空间”测试，但适用于我们的位提取技巧：

+ a0000b00c000d00
+ 0b00c000d000000
+ 00c000d00000000
+ 000d00000000000
= abcd...0X......

I can't come up with a good way to characterize these fields that don't have n-1 space between important bits, but still would work for our operation. 我无法想出一个很好的方法来表征这些在重要位之间没有 n-1空间的字段，但仍然可以用于我们的操作。 However, since we know ahead of time which bits we're interested in we can check our filter to make sure we don't experience carry-bit collisions: 但是，由于我们提前知道我们感兴趣的哪些位，我们可以检查我们的滤波器以确保我们不会遇到进位冲突：

Compare (-1 AND mask) * shift against the expected all-ones result, -1 << (64-n) (for 64-bit unsigned) 比较(-1 AND mask) * shift与预期的全1结果， -1 << (64-n) （对于64位无符号）

The magic shift/multiply to extract our bits works if and only if the two are equal. 当且仅当两者相等时，魔术移位/乘法才能提取我们的位。

#5楼

Very interesting question indeed. 确实非常有趣的问题。 I'm chiming in with my two cents, which is that, if you can manage to state problems like this in terms of first-order logic over the bitvector theory, then theorem provers are your friend, and can potentially provide you with very fast answers to your questions. 我正在用我的两分钱，这就是说，如果你可以通过比特向量理论的一阶逻辑设法说明这样的问题，那么定理证明是你的朋友，并且可以为你提供非常快的速度回答你的问题。 Let's re-state the problem being asked as a theorem: 让我们重新陈述被问到的定理问题：

"There exists some 64-bit constants 'mask' and 'multiplicand' such that, for all 64-bit bitvectors x, in the expression y = (x & mask) * multiplicand, we have that y.63 == x.63, y.62 == x.55, y.61 == x.47, etc." “存在一些64位常量'掩码'和'被乘数'，这样，对于所有64位位向量x，在表达式y =（x＆mask）*被乘数中，我们得到y.63 == x.63 ，y.62 == x.55，y.61 == x.47等。“

If this sentence is in fact a theorem, then it is true that some values of the constants 'mask' and 'multiplicand' satisfy this property. 如果这句话实际上是一个定理，那么常量'mask'和'multiplicand'的某些值确实满足这个属性。 So let's phrase this in terms of something that a theorem prover can understand, namely SMT-LIB 2 input: 所以让我们用一个定理证明者可以理解的东西来表达这个，即SMT-LIB 2输入：

(set-logic BV)

(declare-const mask         (_ BitVec 64))
(declare-const multiplicand (_ BitVec 64))

(assert
  (forall ((x (_ BitVec 64)))
    (let ((y (bvmul (bvand mask x) multiplicand)))
      (and
        (= ((_ extract 63 63) x) ((_ extract 63 63) y))
        (= ((_ extract 55 55) x) ((_ extract 62 62) y))
        (= ((_ extract 47 47) x) ((_ extract 61 61) y))
        (= ((_ extract 39 39) x) ((_ extract 60 60) y))
        (= ((_ extract 31 31) x) ((_ extract 59 59) y))
        (= ((_ extract 23 23) x) ((_ extract 58 58) y))
        (= ((_ extract 15 15) x) ((_ extract 57 57) y))
        (= ((_ extract  7  7) x) ((_ extract 56 56) y))
      )
    )
  )
)

(check-sat)
(get-model)

And now let's ask the theorem prover Z3 whether this is a theorem: 现在让我们问定理证明者Z3这是否是一个定理：

z3.exe /m /smt2 ExtractBitsThroughAndWithMultiplication.smt2

The result is: 结果是：

sat
(model
  (define-fun mask () (_ BitVec 64)
    #x8080808080808080)
  (define-fun multiplicand () (_ BitVec 64)
    #x0002040810204081)
)

Bingo! 答对了！ It reproduces the result given in the original post in 0.06 seconds. 它以0.06秒的速度再现原始帖子中给出的结果。

Looking at this from a more general perspective, we can view this as being an instance of a first-order program synthesis problem, which is a nascent area of research about which few papers have been published. 从更一般的角度来看，我们可以将其视为一阶程序综合问题的一个实例，这是一个新兴的研究领域，很少有论文发表。 A search for "program synthesis" filetype:pdf should get you started. 搜索"program synthesis" filetype:pdf应该可以帮助您入门。

#6楼

In addition to the already excellent answers to this very interesting question, it might be useful to know that this bitwise multiplication trick has been known in the computer chess community since 2007, where it goes under the name of Magic BitBoards . 除了对这个非常有趣的问题已经很好的答案之外，知道这个按位乘法技巧自2007年以来一直在计算机国际象棋界已知，其中它以Magic BitBoards的名义 。

Many computer chess engines use several 64-bit integers (called bitboards) to represent the various piece sets (1 bit per occupied square). 许多计算机国际象棋引擎使用几个64位整数（称为位板）来表示各种组件集（每个占用方块1位）。 Suppose a sliding piece (rook, bishop, queen) on a certain origin square can move to at most K squares if no blocking pieces were present. 假设如果没有阻挡件，某个原点上的滑块（车，主教，女王）可以移动到最多K方格。 Using bitwise-and of those scattered K bits with the bitboard of occupied squares gives a specific K -bit word embedded within a 64-bit integer. 使用bitwise-和那些散布的K位与占用的方块的位板给出了嵌入在64位整数内的特定K位字。

Magic multiplication can be used to map these scattered K bits to the lower K bits of a 64-bit integer. 魔法乘法可用于将这些分散的K位映射到64位整数的低K位。 These lower K bits can then be used to index a table of pre-computed bitboards that representst the allowed squares that the piece on its origin square can actually move to (taking care of blocking pieces etc.) 然后可以使用这些较低的K位来索引预先计算的位板表，该位表示其原点上的块可以实际移动到的允许的正方形（处理阻挡件等）。

A typical chess engine using this approach has 2 tables (one for rooks, one for bishops, queens using the combination of both) of 64 entries (one per origin square) that contain such pre-computed results. 使用这种方法的典型国际象棋引擎有两个表（一个用于车，一个用于主教，两个使用两者的组合），64个条目（每个原始方格一个）包含这样的预先计算结果。 Both the highest rated closed source ( Houdini ) and open source chess engine ( Stockfish ) currently use this approach for its very high performance. 最高评级的闭源（ Houdini ）和开源象棋引擎（ Stockfish ）目前都使用这种方法，因为它具有非常高的性能。

Finding these magic multipliers is done either using an exhaustive search (optimized with early cutoffs) or with trial and erorr (eg trying lots of random 64-bit integers). 使用穷举搜索 （使用早期截止优化）或使用试验和错误 （例如，尝试大量随机64位整数）来查找这些魔术乘数。 There have been no bit patterns used during move generation for which no magic constant could be found. 在移动生成期间没有使用位模式，其中没有找到魔法常数。 However, bitwise carry effects are typically necessary when the to-be-mapped bits have (almost) adjacent indices. 然而，当待映射比特具有（几乎）相邻索引时，通常需要逐位进位效应。

AFAIK, the very general SAT-solver approachy by @Syzygy has not been used in computer chess, and neither does there appear to be any formal theory regarding existence and uniqueness of such magic constants. AFAIK是@Syzygy非常普遍的SAT求解器，它没有被用于计算机象棋，也没有关于这种魔法常数的存在性和唯一性的任何形式理论。