std::map 是通过数据结构红黑树实现的(自平衡二叉查找树)。因此,一个数据结构能够作为map的key的前提是,该数据结构能够 copiable and assignable ,并且能够被用来比较大小( overrides operator<
)。
注意,这里的 operator<
要满足 strict weak ordering ,即 if( !(x<y) && !(y<x) )
为true, 等价于 x==y
为true, 所以判断两个对象是否相等的条件也要现在operator<
中。
而 std::unordered_map则要求key能够 copiable and assignable 并且能够被hash(overrides operator()
) ,和能够被判断是否相等(operator==()
以处理hash冲突的情况)。
关于什么情况下使用map/unordered_map:
see link: http://kariddi.blogspot.hk/2012/07/c11-unorderedmap-vs-map.html
So , is the new unordered_map worth it? Well, for integer keys the G++ implementation showed pretty good performance. The hashing for integer numbers is lighting fast (probably it is skipped completely and the integer itself is used as the hash value, but I didn’t check). Using string keys g++ unordered_map showed some performance problems , at least with the examples I used. The problems were mitigated by increasing the bucket count of the map, but at the cost of an increased memory footprint. Overall , for non-integer keys, the std::map implementation in g++ 4.7.1 libstdc++ seems more robust and less dependent on how the key hash values collide then std::unordered_map. Std::map also comes with the added bonus of being ordered. Those who thought that std::map would have been completely replaced by std::unordered_map for all the usages that didn’t require the items to be ordered may have remained disapponted … at least for now.
key为整数的时候: unordered_map
key为string的时候: map
关于string常用的hash算法有如下:
转载自:http://www.cse.yorku.ca/~oz/hash.html
djb2
djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes.
this algorithm (k=33
) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i];
the magic of number 33
(why it works better than many other constants, prime or not) has never been adequately explained.
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
著名内存数据库 Redis 就是使用该hash函数。
sdbm
this algorithm was created for sdbm (a public-domain reimplementation of ndbm) database library. it was found to do well in scrambling bits, causing better distribution of the keys and fewer splits. it also happens to be a good general hashing function with good distribution. the actual function is hash(i) = hash(i - 1) * 65599 + str[i];
what is included below is the faster version used in gawk. [there is even a faster, duff-device version] the magic constant 65599
was picked out of thin air while experimenting with different constants, and turns out to be a prime. this is one of the algorithms used in berkeley db (see sleepycat) and elsewhere.
static unsigned long
sdbm(str)
unsigned char *str;
{
unsigned long hash = 0;
int c;
while (c = *str++)
hash = c + (hash << 6) + (hash << 16) - hash;
return hash;
}
lose lose
This hash function appeared in K&R (1st ed) but at least the reader was warned: “This is not the best possible algorithm, but it has the merit of extreme simplicity.” This is an understatement; It is a terrible hashing algorithm, and it could have been much better without sacrificing its “extreme simplicity.” [see the second edition!] Many C programmers use this function without actually testing it, or checking something like Knuth’s Sorting and Searching, so it stuck. It is now found mixed with otherwise respectable code, eg. cnews. sigh.
unsigned long
hash(unsigned char *str)
{
unsigned int hash = 0;
int c;
while (c = *str++)
hash += c;
return hash;
}