今天看到国外国际象棋程序beobulf中一段计算64位长整数中1 bit的个数的计算法。它使用inbits表代表8位整数的1的个数的统计表,然后将64位分8段分别查表累计,程序如下,得到了一个快速算法。理论上这个算法够快了,除非你做一张更大的统计表。
/* A list of the number of bits in numbers from 0-255. This is used in the * bit counting algorithm. Thanks to Dann Corbit for this one. */
static int inbits[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};
/* This algorithm thanks to Dann Corbit. Apparently it's faster than
* the standard one. */
int Count(const BITBOARD B) {
return inbits[(unsigned char) B] +
inbits[(unsigned char) (B >> 8)] +
inbits[(unsigned char) (B >> 16)] +
inbits[(unsigned char) (B >> 24)] +
inbits[(unsigned char) (B >> 32)] +
inbits[(unsigned char) (B >> 40)] +
inbits[(unsigned char) (B >> 48)] +
inbits[(unsigned char) (B >> 56)];
}
当我想有没有更简洁的方法时,这让我想到过去一个面试题,要用一行代码检查一个数是否是2的幂。当时没做出来,之后3天才想到判据是: 0 == ( x^(x-1) ),当然于事无补。
这时感觉这个判据是可以用在整形中1的个数的计算中的,当走过2条街后就想到了。当然这个算法在平均时间上不如上面算法,最大要循环64次(x=2^64-1时)才能获得结果。而且循环中还有跳转指令耗时,不过和逐个比特检查相比效率要高,还有优点就是简洁。
int bitcount( u64 x )
{
int bitcnt;
for( bitcnt = 0 ; x ; bitcnt++ )
{
x = x ^ (x-1);
}
return bitcnt;
}
此即 『Hacker's Delight』中 “Figure 5-3 Counting 1-bits in a sparsely populated word.”
可能存在的一个改进算法,能做的就是减少可能的跳转循环过程,这也是beobulf累计时没用循环的原因,你懂的,ok。
int bitcount2( u64 x )
{
int bitcnt;
for( bitcnt = 0 ; x ; )
{
x = x ^ (x-1); bitcnt += !!x;
x = x ^ (x-1); bitcnt += !!x;
x = x ^ (x-1); bitcnt += !!x;
x = x ^ (x-1); bitcnt += !!x;
}
return bitcnt;
}
附(2012-6-18)
最近看hacker's delight, 5-1开篇提到的算法,对于已知字长的“1”数目计算方法相当好。对于32位整数的计算方法如下。
x = (x & 0x55555555) + ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F);
x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF);
x = (x & 0x0000FFFF) + ((x >>16) & 0x0000FFFF);
以上算法不用查询表,也可以在log2(32)=5次计算后得到结果,除赋值外一共需要20个逻辑运算。考虑到其中某些步骤不存在进位影响计算结果的危险,进一步优化后的算法只需要 15个逻辑运算。
int populate32(unsigned x) {
x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x = x + (x >> 8);
x = x + (x >> 16);
return x & 0x0000003F;
}
64位版本可以是
int populate64(unsigned x) { x = x - ((x >> 1) & 0x5555555555555555); x = (x & 0x3333333333333333) + ((x >> 2) & 0x3333333333333333); x = (x + (x >> 4)) & 0x0F0F0F0F0F0F0F0F; x = x + (x >> 8); x = x + (x >> 16); x = x + (x >> 32); return x & 0x0000007F; }