奇偶校验码算法_算法问题解决：如何有效地计算数字流的奇偶校验

最新推荐文章于 2024-07-26 20:00:33 发布

cumifi2519

最新推荐文章于 2024-07-26 20:00:33 发布

阅读量2.4k

点赞数

文章标签：算法 python java 大数据分布式

原文链接：https://www.freecodecamp.org/news/algorithmic-problem-solving-efficiently-computing-the-parity-of-a-stream-of-numbers-cd652af14643/

版权

奇偶校验码算法

问题陈述： (Problem Statement:)

You are getting a stream of numbers (say long type numbers), compute the parity of the numbers. Hypothetically you have to serve a huge scale like 1 million numbers per minute. Design an algorithm considering such scale. Parity of a number is 1 if the total number of set bits in the binary representation of the number is odd else parity is 0.

您将得到一连串数字(例如long型数字)，计算数字的奇偶校验。假设您必须为每分钟100万个数字提供服务。设计考虑这种规模的算法。如果数字的二进制表示形式的置位总数为奇数，则奇偶校验为1，否则奇偶校验为0。

解： (Solution:)

方法1-蛮力： (Approach 1 - Brute Force:)

The problem statement clearly states what parity is. We can calculate the total number of set bits in the binary representation of the given number. If the total number of set bits is odd, parity is 1 else 0. So the naive way is to keep doing a bit-wise right shift on the given number & check the current least significant bit (LSB) to keep track of the result.

问题陈述清楚地说明了什么是奇偶校验。我们可以计算给定数字的二进制表示形式中设置位的总数。如果设置位的总数为奇数，则奇偶校验为1否则为0 。因此，幼稚的方法是对给定的数字进行逐位右移并检查当前的最低有效位(LSB)以跟踪结果。

In the above code snippet, we are going through all the bits in the while loop one by one. With the condition ((no & 1) == 1) , we check if the current LSB is 1 or 0 , if 1 , we do result ^= 1 . The variable result is initialized to 0 . So when we do xor (^) operation between the current value of result & 1 , the result will be set to 1 if the result is currently 0 , otherwise 1 .

在上面的代码片段中，我们将逐一遍历while循环中的所有位。在条件((no & 1) == 1) ，我们检查当前LSB是1还是0 ，如果为1 ，则得出result ^= 1 。变量result初始化为0 。因此，当我们在result与1的当前值之间进行xor (^)运算时，如果result当前为0 ，则result将被设置为1 ，否则为1 。

If there are an even number of set bits, eventually the result will become 0 because xor between all 1’s will cancel out each other. If there are an odd number of 1’s, the final value of result will be 1. no >&gt;> 1 right shifts the bits by 1.

如果设置的比特数为偶数，则最终的result将为0因为所有1's之间的xor将相互抵消。如果奇数为1's ，则result的最终值为1 。 no >& 1会将位右移1。

>;>> is logical right shift operator in java which shifts the sign bit (the most significant bit in a signed number) as well. There is another right shift operator — >> which is called arithmetic right shift operator [see reference 1 at the last of the page]. It does not shift the sign bit in the binary representation — the sign bit remains intact at its position. Finally result & 0x1 returns 1 if there is parity or 0 otherwise.

> ; >>是Java中的逻辑右移运算符，它也将符号位(带符号的数字中的最高有效位)也进行移位。还有一个右移运算er ATOR - >>这就是所谓的算术正确s HIFT操作[查看页面的L- AST 参考1]。 它不会移动二进制表示形式中的符号位-符号位在其位置保持不变ition. Final ition. Final成绩＆ e 0x1如果奇偶校验返回1，否则返回0。

Advantages:

优点：

The solution is very easy to understand & implement.
该解决方案非常易于理解和实施。

Disadvantages:

缺点：

We are processing all the bits manually, so this approach is hardly efficient at scale.
我们正在手动处理所有位，因此这种方法在规模上几乎无效。

Time Complexity: O(n) where n is the total number of bits in the binary representation of the given number.

时间复杂度： O(n) ，其中n是给定数字的二进制表示形式的总位数。

方法2-清除所有设置的位： (Approach 2 - Clear all the set bits one by one:)

There is a bottleneck in the above solution: the while loop itself. It just goes through all bits one by one, do we really need to do that? Our concern is about set bits, so we are not getting any benefits by going over unset bits or 0 bits. If we can just go over only set bits, our solution becomes little more optimized. In bitwise computation, if we are given a number n, we can clear the rightmost set bit with the following operation:

上述解决方案存在瓶颈： while循环本身。它只是一点一点地遍历了，我们真的需要这样做吗？我们关心的是设置位，因此通过跳过未设置的位或0位不会获得任何好处。如果我们只能检查设置的位，那么我们的解决方案将变得更加优化。在按位计算中，如果给定数字n ，则可以通过以下操作清除最右边的设置位：

n = n & (n-1)

Take an example: say n = 40, the binary representation in 8-bit format is: 00101000.

举一个例子： n = 40 00101000位格式的二进制表示形式是： 00101000 。

n           = 0010 1000
n - 1       = 0010 0111
n & (n - 1) = 0010 0000

We have successfully cleared the lowest set bit (4th bit from the right side). If we keep doing this, the number n will become 0 at a certain point in time. Based on this logic, if we compute parity, we don’t need to scan all bits. Rather we scan only k bits where k is the total number of set bits in the number & k <= length of the binary representation. Following is the code:

我们已经成功清除了最低的设置位(右侧第4位)。如果我们继续这样做，则数字n在某个时间点将变为0 。基于此逻辑，如果我们计算奇偶校验，则无需扫描所有位。相反，我们仅扫描k位，其中k是k <= length of the binary representation的数量＆ k <= length of the binary representation置位总数。以下是代码：

Advantages:

优点：

Simple to implement.
易于实现。
More efficient than brute force solution.
比蛮力解决方案更有效。

Disadvantages:

缺点：

It’s not the most efficient solution.
这不是最有效的解决方案。

Time Complexity:

时间复杂度：

O(k) where k is the total number of set bits in the number.

O(k)其中， k是数字中设置位的总数。

方法3-缓存： (Approach 3 - Caching:)

Look at the problem statement once more, there’s definitely a concern about scale. Can our earlier solutions scale to serve millions of requests or still is there any scope to do better?

再看一下问题陈述，肯定存在规模问题。我们早先的解决方案可以扩展以服务数百万个请求，还是还有更好的余地？

We can probably make the solution faster if we can store the result in memory — caching. In this way we can save some CPU cycles to compute the same result. So if the total number of bits is 64 , how much memory do we need to save all possible numbers? 64 bits will lead us to have Math.pow(2, 64) possible signed numbers (the most significant bit is used to store only sign). The size of a long type number is 64 bits or 8 bytes, so total memory size required is: 64 * Math.pow(2, 64) bits or 134217728 TeraBytes. This is too much & is not worth it to store such a humongous amount of data. Can we do better?

如果可以将结果存储在内存中，则可以使解决方案更快。这样，我们可以节省一些CPU周期来计算相同的结果。因此，如果总位数为64 ，那么我们需要多少内存来保存所有可能的数字？ 64位将使我们具有Math.pow(2, 64)可能的带符号数字(最高有效位仅用于存储符号)。 long型数字的大小为64位或8个字节，因此所需的总内存大小为： 64 * Math.pow(2, 64) 134217728 TeraBytes 64 * Math.pow(2, 64)位或134217728 TeraBytes 。这太多了，不值得存储如此庞大的数据量。我们可以做得更好吗？

We can break the 64 bits number into a group of 16 bits, fetch the parity of those individual group of bits from cache & combine them. This solution works because 16 divides 64 into 4 equal parts & we are concerned just about the total number of set bits. So as far as we are getting parity of those individual group of bits, we can xor their results with each other, since xor is associative & commutative. The order in which we fetch those group of bits & operate on them does not even matter.

我们可以将64位数字分成一组16位，从缓存中获取那些单独的位的奇偶校验位并将它们组合起来。该解决方案之所以有效，是因为16将64分为4相等的部分，我们只关心设置位的总数。所以，只要我们得到位的那些个人组的奇偶性，我们可以xor他们的研究结果相互的，因为xor是联想与交换。我们获取这些位组并对其进行操作的顺序甚至都没有关系。

If we store those 16 bit numbers as an integer, total memory required is: Math.pow(2, 16) * 32 bits = 256 Kilo Bytes.

如果我们将这16位数字存储为整数，则所需的总内存为： Math.pow(2, 16) * 32 bits = 256 Kilo Bytes 。

In the above snippet, we shift a group of 16 bits by i * WORD_SIZE where 0 ≤ i ≤ 3 and do bitwise AND operation (&) with a mask = 0xFFFF (0xFFFF = 1111111111111111 ) so that we can just extract the rightmost 16 bits as integer variables like masked1, masked2 etc, we pass these variables to a method checkAndSetInCache which computes the parity of this number in case it’s not available in the cache. In the end, we just do xor operation on the result of these group of numbers which determines the final parity of the given number.

在上面的代码段中，我们转向一组16由位i * WORD_SIZE其中0 ≤ i ≤ 3和做逐位AND运算( &用) mask = 0xFFFF ( 0xFFFF = 1111111111111111 )，以便我们可以只提取最右边的16个比特作为诸如masked1, masked2等的整数变量，我们将这些变量传递给方法checkAndSetInCache ，该方法将计算此数字的奇偶校验，以防该数字在缓存中不可用。最后，我们仅对这组数字的结果进行xor运算，以确定给定数字的最终奇偶性。

Advantages:

优点：

At the cost of relatively small memory for the cache, we get better efficiency since we are reusing a group of 16-bit numbers across inputs.
以相对较小的缓存内存为代价，由于我们在输入之间重用了一组16位数字，因此效率得到了提高。
This solution can scale well as we are serving millions of numbers.
由于我们正在为数以百万计的号码提供服务，因此该解决方案可以很好地扩展。

Disadvantages:

缺点：

If this algorithm needs to be implemented in an ultra-low memory device, the space complexity has to be well thought of in advance in order to decide whether it’s worth it to accommodate such amount of space.
如果需要在超低存储设备中实现此算法，则必须事先考虑好空间复杂度，以决定是否值得容纳这样的空间。

Time Complexity:

时间复杂度：

O(n / WORD_SIZE) where n is the total number of bits in the binary representation. All right / left shift & bitwise &, |, ~ etc operations are word level operations which are done extremely efficiently by CPU. Hence their time complexity is supposed to be O(1).

O(n / WORD_SIZE) ，其中n是二进制表示形式的总位数。所有向右/向左移位和按位&, |, ~等操作都是字级操作，由CPU非常高效地完成。因此，它们的时间复杂度应该为O(1) 。

方法4-使用异或运算： (Approach 4 - Using XOR & Shifting operations:)

Let’s consider this 8-bit binary representation: 1010 0100. The parity of this number is 1. What happens when we do a right shift on this number by 4 & xor that with the number itself?

让我们考虑以下8位二进制表示形式： 1010 0100 。该数字的奇偶校验为1 。当我们对该数字右移4或对该数字本身右移时会发生什么？

n                 = 1010 0100
n >>> 4           = 0000 1010
n ^ (n >> 4)      = 1010 1110
n = n ^ (n >>> 4) = 1010 1110 (n is just assigned to the result)

In rightmost 4 bits, all the bits are set which are different in n & n >&gt;> 4 . Now let’s concentrate on this right most 4 bits only: 1110 , let’s forget about other bits. Now n is 1010 1110 & we are just concentrated on the lowest 4 bits i.e; 1110. Let’s do a bitwise right shift on n by 2.

在最右边的4位中，设置所有在n ＆ n >& > 4中不同的位。现在，让我们集中在这个最右边的4个ts o唯一一句：1110，让我们忘了其它b- i TS。 Now n is 1010 1110＆我们只是集中在th e最低4b中its即; 1110让我们做一个按位向右s HIFT 在n上乘2。

n                 = 1010 1110
n >>> 2           = 0010 1011
n ^ (n >>> 2)     = 1000 0101
n = n ^ (n >>> 2) = 1000 0101 (n is just assigned to the result)

Just concentrate on the rightmost 2 bits now & forget about leftmost 6 bits. Let’s right shift the number by 1:

现在只需专注于最右边的2位，而忽略最左边的6位。让我们将数字右移1 ：

n                 = 1000 0101
n >>> 1           = 0100 0010
n ^ (n >>> 1)     = 1100 0111
n = n ^ (n >>> 1) = 1100 0111 (n is just assigned to the result)

We don’t need to right shift anymore, we just now extract the LSB bit which is 1 in the above case & return the result: result = (short) n & 1 .

我们不再需要右移，我们只需提取上述情况下的LSB位为1并返回结果： result = (short) n & 1 。

At a glance, the solution might look a little confusing, but it works. How? We know that 0 xor 1 or 1 xor 0 is 1, otherwise 0. So when we divide the binary representation of a number into two equal halves by length & we do xor between them, all different pair of bits result into set bits in the xor-ed number.

乍一看，该解决方案可能看起来有些混乱，但它确实有效。怎么样？我们知道0 xor 1或1 xor 0为1 ，否则为0 。因此，当我们将一个数字的二进制表示形式按长度分成两个相等的一半，并且在它们之间进行xor运算时，所有不同的位对都将成为异或数中的置位。

Since parity occurs when an odd number of set bits are there in the binary representation, we can use xor operation to check if an odd number of 1 exists there. Hence we right shift the number by half of the total number of digits, we xor that shifted number with the original number, we assign the xor-ed result to the original number & we concentrate only on the rightmost half of the number now. So we are just xoring half of the numbers at a time & reduce our scope of xor. For 64 bit numbers, we start xoring with 32 bit halves, then 16 bit halves, then 8, 4, 2, 1 respectively.

由于奇偶校验是在二进制表示形式中存在奇数个设置位时发生的，因此我们可以使用xor操作检查那里是否存在奇数1 。因此，我们通过权的总位数的一半数量的移位，我们xor与原始数移位数，我们的异或结果分配给原来的号码与我们只集中在数的最右边的一半现在。因此，我们一次只对一半的数字进行异或运算，并减小了异或运算的范围。为64张的数，我们先从异或32位的两半，然后16位的两半，然后8 ， 4 ， 2 ， 1分别。

Essentially, parity of a number means parity of xor of equal halves of the binary representation of that number. The crux of the algorithm is to concentrate on rightmost 32 bits first, then 16, 8, 4 , 2 , 1 bits & ignore other left side bits. Following is the code:

本质上，数字的奇偶校验意味着该数字的二进制表示的等分一半的xor或。该算法的关键是集中于最右边的32个比特，然后再16 ， 8 ， 4 ， 2 ， 1位及忽略其他左侧比特。以下是代码：

Advantages:

优点：

No extra space uses word-level operations to compute the result.
没有多余的空间使用字级运算来计算结果。

Disadvantages:

缺点：

Might be little difficult to understand for developers.
对于开发人员而言，可能有点难以理解。

Time Complexity:

时间复杂度：

O(log n) where n is the total number of bits in the binary representation.

O(log n)其中n是二进制表示形式的总位数。

Following is the full working code:

以下是完整的工作代码：

import java.util.Arrays;

public class ParityOfNumber {

    private static short computeParityBruteForce(long no) {
        int result = 0;
        while(no != 0) {
            if((no & 1) == 1) {
                result ^= 1;
            }

            no >>>= 1;
        }

        return (short) (result & 0x1);
    }

    private static short computeParityBasedOnClearingSetBit(long no) {
        int result = 0;
        while (no != 0) {
            no = no & (no - 1);
            result ^= 1;
        }

        return (short) (result & 0x1);
    }

    private static short computeParityWithCaching(long no) {
        int[] cache = new int[(int) Math.pow(2, 16)];
        Arrays.fill(cache, -1);

        int WORD_SIZE = 16;
        int mask = 0xFFFF;

        int masked1 = (int) ((no >>> (3 * WORD_SIZE)) & mask);
        checkAndSetInCache(masked1, cache);

        int masked2 = (int) ((no >>> (2 * WORD_SIZE)) & mask);
        checkAndSetInCache(masked2, cache);

        int masked3 = (int) ((no >>> WORD_SIZE) & mask);
        checkAndSetInCache(masked3, cache);

        int masked4 = (int) (no & mask);
        checkAndSetInCache(masked4, cache);

        int result = (cache[masked1] ^ cache[masked2] ^ cache[masked3] ^ cache[masked4]);
        return (short) (result & 0x1);
    }

    private static void checkAndSetInCache(int val, int[] cache) {
        if(cache[val] < 0) {
            cache[val] = computeParityBasedOnClearingSetBit(val);
        }
    }

    private static short computeParityMostEfficient(long no) {
        no ^= (no >>> 32);
        no ^= (no >>> 16);
        no ^= (no >>> 8);
        no ^= (no >>> 4);
        no ^= (no >>> 2);
        no ^= (no >>> 1);

        return (short) (no & 1);
    }

    public static void main(String[] args) {
        long no = 1274849;
        System.out.println("Binary representation of the number: " + Long.toBinaryString(no));

        System.out.println("Is Parity [computeParityBruteForce]: " + computeParityBruteForce(no));
        System.out.println("Is Parity [computeParityBasedOnClearingSetBit]: " + computeParityBasedOnClearingSetBit(no));
        System.out.println("Is Parity [computeParityMostEfficient]: " + computeParityMostEfficient(no));
        System.out.println("Is Parity [computeParityWithCaching]: " + computeParityWithCaching(no));
    }
}

从此练习中学到： (Learning from this exercise:)

Although it’s basic knowledge, I want to mention that word level bitwise operations is constant in time.
尽管这是基础知识，但我想提一下字级按位运算在时间上是恒定的。
At a scale, we can apply caching by breaking down the binary representation into equal halves of suitable word size like 16 in our case so that we can accommodate all possible numbers in memory. Since we are supposed to handle millions of numbers, we will end up reusing 16 bit groups from cache across numbers. The word size does not necessarily need to be 16, it depends on your requirement & experiments.
在某种程度上，我们可以通过将二进制表示形式分解为合适的单词大小的一半(例如16来应用缓存，以便我们可以在内存中容纳所有可能的数字。由于我们应该处理数百万个数字，因此最终将跨数字重用缓存中的16位组。单词大小不一定需要为16 ，这取决于您的要求和实验。
You don’t need to store the binary representation of a number in the separate array to operate on it, rather clever use of bitwise operations can help you achieve your target.
您无需将数字的二进制表示形式存储在单独的数组中即可对其进行操作，而巧妙地使用按位运算可以帮助您实现目标。