# 选择数字31的原因

public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;

    for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;


}

s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1]

i=0 -> h = 31 * 0 + val[0]
i=1 -> h = 31 * (31 * 0 + val[0]) + val[1]
i=2 -> h = 31 * (31 * (31 * 0 + val[0]) + val[1]) + val[2]
h = 31*31*31*0 + 31*31*val[0] + 31*val[1] + val[2]
h = 31^(n-1)*val[0] + 31^(n-2)*val[1] + val[2]

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i“. Modern VMs do this sort of optimization automatically.

As Goodrich and Tamassia point out, If you take over 50,000 English words (formed as the union of the word lists provided in two variants of Unix), using the constants 31, 33, 37, 39, and 41 will produce less than 7 collisions in each case. Knowing this, it should come as no surprise that many Java implementations choose one of these constants.

1. 实验及数据可视化

3.1 哈希值冲突率计算

public static Integer hashCode(String str, Integer multiplier) {
int hash = 0;
for (int i = 0; i < str.length(); i++) {
hash = multiplier * hash + str.charAt(i);
}

return hash;


}

/**
* 计算 hash code 冲突率，顺便分析一下 hash code 最大值和最小值，并输出
* @param multiplier
* @param hashs
*/
public static void calculateConflictRate(Integer multiplier, List hashs) {
Comparator cp = (x, y) -> x > y ? 1 : (x < y ? -1 : 0);
int maxHash = hashs.stream().max(cp).get();
int minHash = hashs.stream().min(cp).get();

// 计算冲突数及冲突率
int uniqueHashNum = (int) hashs.stream().distinct().count();
int conflictNum = hashs.size() - uniqueHashNum;
double conflictRate = (conflictNum * 1.0) / hashs.size();

System.out.println(String.format("multiplier=%4d, minHash=d, maxHash=d, conflictNum=%6d, conflictRate=%.4f%%",
multiplier, minHash, maxHash, conflictNum, conflictRate * 100));
`

}

3.2 哈希值分布可视化

/**
* 将整个哈希空间等分成64份，统计每个空间内的哈希值数量
* @param hashs
*/
public static Map