hash函数 java_几种常见的hash函数

最新推荐文章于 2024-05-12 09:24:57 发布

嘉良刘

最新推荐文章于 2024-05-12 09:24:57 发布

阅读量1.9k

点赞数

文章标签： hash函数 java

本文链接：https://blog.csdn.net/weixin_42356292/article/details/115064754

版权

概览

最近在看redis源码，发现redis采用了几种不同的算法来计算Hash Code;因此打算借此整理下JDK中的实现，加深理解;

Redis

Thomas Wang's 32 bit Mix Function

关于该算法的具体内容，可以参考这篇文章,算法源码如下：

public int hash32shift(int key)

{

key = ~key + (key << 15); // key = (key << 15) - key - 1;

key = key ^ (key >>> 12);

key = key + (key << 2);

key = key ^ (key >>> 4);

key = key * 2057; // key = (key + (key << 3)) + (key << 11);

key = key ^ (key >>> 16);

return key;

}

可以看到它是计算int类型的hash code,返回结果也是int类型;另外它还提供了64bit的算法，有兴趣的可以自行查阅:

根据Thomas Wang所描述，由于上述代码可以利用CPU的native指令，在HP 9000 workstations机器上只需要11个时钟周期，速度很快；

比较费解这边的常量值是怎么确定的，无奈算法基础比较差，如果有了解的同学，欢迎答疑；

Austin Appleby's MurmurHash2

MurmurHash是一种非加密型哈希函数，由Austin Appleby在2008年发明，并且有多个变种;该算法的作者2011去了google,开发出来了新的算法CityHash；

unsigned int dictGenHashFunction(const void *key, int len) {

/* 'm' and 'r' are mixing constants generated offline.

They're not really 'magic', they just happen to work well. */

uint32_t seed = dict_hash_function_seed;

const uint32_t m = 0x5bd1e995;

const int r = 24;

/* Initialize the hash to a 'random' value */

uint32_t h = seed ^ len;

/* Mix 4 bytes at a time into the hash */

const unsigned char *data = (const unsigned char *)key;

while(len >= 4) {

uint32_t k = *(uint32_t*)data;

k *= m;

k ^= k >> r;

k *= m;

h *= m;

h ^= k;

data += 4;

len -= 4;

}

/* Handle the last few bytes of the input array */

switch(len) {

case 3: h ^= data[2] << 16;

case 2: h ^= data[1] << 8;

case 1: h ^= data[0]; h *= m;

};

/* Do a few final mixes of the hash to ensure the last few

* bytes are well-incorporated. */

h ^= h >> 13;

h *= m;

h ^= h >> 15;

return (unsigned int)h;

}

Murmur可以计算字符串的hash code,基本思想就是把key分成n组，每组4个字符，把这4个字符看成是一个uint_32，进行n次运算，得到一个h，然会在对h进行处理，得到一个相对离散的hash code;

DJB Hash

unsigned int DJBHash(char *str)

{

unsigned int hash = 5381;

while (*str){

hash = ((hash << 5) + hash) + (*str++); /* times 33 */

}

hash &= ~(1 << 31); /* strip the highest bit */

return hash;

}

Redis对其进行了部分调整，不区分大小写：

unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len) {

unsigned int hash = (unsigned int)dict_hash_function_seed;

while (len--)

hash = ((hash << 5) + hash) + (tolower(*buf++)); /* hash * 33 + c */

return hash;

}

JDK

Austin Appleby's MurmurHash3

从JDK7开始，JDK引入了一种新的计算Hash code的算法，可以在如下的集合对象中使用:

HashMap

Hashtable

HashSet

LinkedHashMap

LinkedHashSet

WeakHashMap

ConcurrentHashMap

final int hash(Object k) {

int h = hashSeed;

if (0 != h && k instanceof String) {

return sun.misc.Hashing.stringHash32((String) k);

}

h ^= k.hashCode();

// This function ensures that hashCodes that differ only by

// constant multiples at each bit position have a bounded

// number of collisions (approximately 8 at default load factor).

h ^= (h >>> 20) ^ (h >>> 12);

return h ^ (h >>> 7) ^ (h >>> 4);

}

可以看到只有key为字符类型，而且hashSeed不为0时才采用新的哈希算法；

sun.misc.Hashing.stringHash32实际上调用的是String.hash32方法:

int hash32() {

int h = hash32;

//h==0表示未计算过hash code

//可以看到hash code只会计算一次

if (0 == h) {

//HASHING_SEED是根据时间戳等计算出来的随机数

h = sun.misc.Hashing.murmur3_32(HASHING_SEED, value, 0, value.length);

//确保结果非0，避免重新计算

h = (0 != h) ? h : 1;

hash32 = h;

}

return h;

}

可以看到最终调用的是MurmurHash3算法，那么什么情况下hashSeed不为0呢？

final boolean initHashSeedAsNeeded(int capacity) {

boolean currentAltHashing = hashSeed != 0;

boolean useAltHashing = sun.misc.VM.isBooted() &&

(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);

boolean switching = currentAltHashing ^ useAltHashing;

if (switching) {

hashSeed = useAltHashing

? sun.misc.Hashing.randomHashSeed(this)

: 0;

}

return switching;

}

可以看到是否采用新算法和Holder.ALTERNATIVE_HASHING_THRESHOLD有关；实际上JDK定义了一个新的属性jdk.map.althashing.threshold(默认值－1)，只有当系统容量大于该属性值时，才会采用新的算法；因此可以将通过-Djdk.map.althashing.threshold＝0设置该属性为0，启用新的哈希函数；

String.hashCode

public int hashCode() {

int h = hash;

if (h == 0 && value.length > 0) {

char val[] = value;

for (int i = 0; i < value.length; i++) {

h = 31 * h + val[i];

}

hash = h;

}

return h;

}

Object.hashCode

之前曾经在Java对象结构中介绍了Java的MarkWord,其中记录了对象的hashCode;如果对象计算过hashCode，则会存储到对象头中；如果是第一次计算则是调用

static inline intptr_t get_next_hash(Thread * Self, oop obj) {

intptr_t value = 0 ;

if (hashCode == 0) {

// This form uses an unguarded global Park-Miller RNG,

// so it's possible for two threads to race and generate the same RNG.

// On MP system we'll have lots of RW access to a global, so the

// mechanism induces lots of coherency traffic.

value = os::random() ;

} else

if (hashCode == 1) {

// This variation has the property of being stable (idempotent)

// between STW operations. This can be useful in some of the 1-0

// synchronization schemes.

intptr_t addrBits = intptr_t(obj) >> 3 ;

value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;

} else

if (hashCode == 2) {

value = 1 ; // for sensitivity testing

} else

if (hashCode == 3) {

value = ++GVars.hcSequence ;

} else

if (hashCode == 4) {

value = intptr_t(obj) ;

} else {

// Marsaglia's xor-shift scheme with thread-specific state

// This is probably the best overall implementation -- we'll

// likely make this the default in future releases.

unsigned t = Self->_hashStateX ;

t ^= (t << 11) ;

Self->_hashStateX = Self->_hashStateY ;

Self->_hashStateY = Self->_hashStateZ ;

Self->_hashStateZ = Self->_hashStateW ;

unsigned v = Self->_hashStateW ;

v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;

Self->_hashStateW = v ;

value = v ;

}

value &= markOopDesc::hash_mask;

if (value == 0) value = 0xBAD ;

assert (value != markOopDesc::no_hash, "invariant") ;

TEVENT (hashCode: GENERATE) ;

return value;

}

可以看到，这边提供了多种不同的实现，具体采用哪种，取决于hashCode的值，在Linux机器上,hashCode默认为0:

bb64cd7593ab

hashCode.png

因此是通过os::random计算hashCode,可以参考Java对象结构；

其它

Character、Integer、Long、Double的hashCode方法返回包装的基本类型;

嘉良刘

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hash函数 java_几种常见的hash函数

概览最近在看redis源码，发现redis采用了几种不同的算法来计算Hash Code;因此打算借此整理下JDK中的实现，加深理解;RedisThomas Wang's 32 bit Mix Function关于该算法的具体内容，可以参考这篇文章,算法源码如下：public int hash32shift(int key){key = ~key + (key << 15); // ke...
复制链接

扫一扫