前天发现一个微软 hash_map的效率问题。一个对<unsigned int, int>的六百万次查找,用了接近一分钟。本来不觉得怎么的,可是相似的代码,在ubuntu上面,执行才不到一秒。这个问题就大了。
#include < hash_map.h >
#include < stdio.h >
#include < sys / time.h >
using namespace std;
typedef hash_map < unsigned int , int > rhashMap_t;
int GetTickCount( void )
... {
struct timeval now;
gettimeofday(&now, NULL);
return now.tv_sec * 1000 + now.tv_usec / 1000;
}
int main( int argc, const char * argv[])
... {
rhashMap_t hashset;
size_t nCount = atoi(argv[1]);
size_t nFindTime = atoi(argv[2]);
size_t i = 0;
for (; i< nCount; ++i) ...{
unsigned int key = (unsigned int) rand();
hashset[key] = 1;
}
printf("key generated ");
int cw_start_stamp;
int cs_start_stamp, cs_total=0;
cw_start_stamp = GetTickCount();
for (i=0; i<nFindTime; ++i)
...{
unsigned int key = (unsigned int) rand();
// cs_start_stamp = GetTickCount();
hashset.find( key ) != hashset.end();
// cs_total += (GetTickCount() - cs_start_stamp);
}
printf(" Rolling is done. "
" Time spent in while(): %u ms. " ,
GetTickCount() - cw_start_stamp);
getchar();
return 0;
}
在Linux服务器上的运行结果:
key generated
Rolling is done.
Time spent in while (): 838 ms.
Rolling is done.
Time spent in while (): 838 ms.
在Windows ( XP 和 Vista) 的运行结果
key generated
Rolling is done.
Time spent in while (): 53273 ms.
Rolling is done.
Time spent in while (): 53273 ms.
在Windows机器上跟踪执行的代码,初步怀疑,是微软版本的hash_map,hash计算分布不够均匀所致。那hash代码太简单了,look:
size_type _Hashval(
const
key_type
&
_Keyval)
const
... { // return hash value, masked and wrapped to current table size
size_type _Num = this->comp(_Keyval) & _Mask;
if (_Maxidx <= _Num)
_Num -= (_Mask >> 1) + 1;
return (_Num);
}
... { // return hash value, masked and wrapped to current table size
size_type _Num = this->comp(_Keyval) & _Mask;
if (_Maxidx <= _Num)
_Num -= (_Mask >> 1) + 1;
return (_Num);
}
_Mask 是一个固定值,为0xdeaddeef。(Dead deef -- 死牛吗?)
这真是出了个难题--难道我还必须去搞其他的Hashmap来替代微软的,如果我一定要在windows平台上使用的话?