哈希函数性能评测（Hash Function Performance Test）

最新推荐文章于 2022-03-27 21:14:18 发布

weixin_30949361

最新推荐文章于 2022-03-27 21:14:18 发布

阅读量403

点赞数

原文链接：http://www.cnblogs.com/czc0316/archive/2009/10/29/Hash_Function_Test.html

版权

哈希函数性能分析表

之前由于项目中用到了大规模的数据处理，使用了哈希函数作为应用，在此做了些工作将一些哈希( hash )函数的性能和冲突概率进行了测试、总结，并给出了推荐的几种较好的字符串哈希函数。

哈希的目的即将原有的长字符串压缩为32位、64位、128位的哈希编码存储，以节省存储空间。而在这个过程中，起重要作用的便是哈希函数。

在本实验中，采用了常见的一些哈希函数作为对比，并采用了10 million以上（千万级）的较大数据规模进行了测试。

表中的除了最后行表示为时间，其他均为该列哈希函数的冲突概率。

SET 1 ：包含大小写字母、数字的，长度为3-12均匀分布，15 million 个样本。

SET 2 ：仅包含小写字母的，长度为3-12均匀分布， 15 million 个样本

SET 3 ：包含ASCII（32-127）中的常见的字符，长度10-30均匀分布，11 million 个样本。

最后在Release模式下，进行了时间性能测试，即为上表中的最后一行，记录为平均每次哈希（Hash）消耗时间。性能测试，横向来看，差距都并不大，对于哈希函数的选择上，还是应首选冲突小的较好。

测试系统配置：

CPU: AMD 945 X4 MEMORY: 4G SYSTEM: WINDOWS VISTA ULTIMATE (32 bit)

推荐：表中标红的为效果较好的算法，具统计和评论来说BKDR、SDBM、FNV_1 对大规模的字符串哈希来说，有较好的性能表现，推荐使用。同时，如果数据集在 million级以上的话，建议使用64位哈希函数，这样可以有效的避免冲突概率过高的情况。（10Million 上 64位哈希冲突率能到10e-6以下，经过测试）

在下面有全部代码，注释部分为算法的简单摘要，有兴趣的朋友可以去仔细参详下。第一次写BLOG，写的不好请大家指出不足之处。

Code
1

unsigned int RSHash(const char *str)
2

/**//*
3

A simple hash function from Robert Sedgwicks Algorithms in C book.
4

*/
5

{
6

unsigned int b = 378551;
7

unsigned int a = 63689;
8

unsigned int hash = 0;
9

while (*str)
11

{
12

hash = hash * a + (*str++);
13

a *= b;
14

}
15

return (hash & 0x7FFFFFFF);
17

}
18

unsigned int PJWHash(const char *str)
20

/**//*
21

This hash algorithm is based on work by Peter J. Weinberger of AT&T Bell Labs. The book Compilers (Principles, Techniques
22

and Tools) by Aho, Sethi and Ulman, recommends the use of hash functions that employ the hashing methodology found in this
23

particular algorithm.
24

*/
25

{
26

unsigned int BitsInUnignedInt = (unsigned int)(sizeof(unsigned int) * 8);
27

unsigned int ThreeQuarters = (unsigned int)((BitsInUnignedInt * 3) / 4);
28

unsigned int OneEighth = (unsigned int)(BitsInUnignedInt / 8);
29

unsigned int HighBits = (unsigned int)(0xFFFFFFFF) << (BitsInUnignedInt - OneEighth);
30

unsigned int hash = 0;
31

unsigned int test = 0;
32

while (*str)
34

{
35

hash = (hash << OneEighth) + (*str++);
36

if ((test = hash & HighBits) != 0)
37

{
38

hash = ((hash ^ (test >> ThreeQuarters)) & (~HighBits));
39

}
40

}
41

return (hash & 0x7FFFFFFF);
43

}
44

unsigned int JSHash(const char *str)
47

/**//*
48

A simple hash function from Robert Sedgwicks Algorithms in C book.
49

*/
50

{
51

unsigned int hash = 1315423911;
52

while (*str)
54

{
55

hash ^= ((hash << 5) + (*str++) + (hash >> 2));
56

}
57

return (hash & 0x7FFFFFFF);
59

}
60

unsigned int BKDRHash(const char *str)
62

{
63

unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
64

unsigned int hash = 0;
65

while (*str)
67

{
68

hash = hash * seed + (*str++);
69

}
70

return (hash & 0x7FFFFFFF);
72

}
73

unsigned int FNV_1_Hash(const char* str)
75

/**//*
76

Famous hash algorithm in Unix system, also used by Microsoft in their hash_map implementation for VC++ 2005
77

detail can be found in :http://www.isthe.com/chongo/tech/comp/fnv/#FNV-param
78

*/
79

{
80

unsigned int hash = 2166136261;//offset_basis
81

unsigned int prime = 16777619; //FNV_PRIME_32
82

while(*str!='\0')
83

{
84

hash *= prime;
85

hash ^= *str++;
86

}
87

return (hash & 0x7FFFFFFF);
88

}
89

unsigned int FNV_1a_Hash(const char* str)
91

/**//*
92

Famous hash algorithm in Unix system, also used by Microsoft in their hash_map implementation for VC++ 2005
93

detail can be found in :http://www.isthe.com/chongo/tech/comp/fnv/#FNV-param
94

*/
95

{
96

unsigned int hash = 2166136261;//offset_basis
97

unsigned int prime = 16777619; //FNV_PRIME_32
98

while(*str!='\0')
99

{
100

hash ^= *str++;
101

hash *= prime;
102

}
103

return (hash & 0x7FFFFFFF);
104

}
105

106

unsigned int DJBHash(const char *str)
107

/**//*
108

An algorithm produced by Professor Daniel J. Bernstein and shown first to the world on the
109

usenet newsgroup comp.lang.c. It is one of the most efficient hash functions ever published.
110

*/
111

{
112

unsigned int hash = 5381;
113

114

while (*str)
115

{
116

hash += (hash << 5) + (*str++);
117

}
118

119

return (hash & 0x7FFFFFFF);
120

}
121

122

unsigned int DJB_2_Hash(const char* s)
123

{
124

unsigned int hashvalue = 5381;
125

while(*s!='\0')
126

{
127

hashvalue = hashvalue * 33^(*s);
128

s++;
129

}
130

return (hashvalue & 0x7FFFFFFF);
131

}
132

133

unsigned int SDBM_Hash(const char *str)
134

/**//*
135

This is the algorithm of choice which is used in the open source SDBM project.
136

The hash function seems to have a good overall distribution for many different data
137

sets. It seems to work well in situations where there is a high variance in the MSBs of the
138

elements in a data set.
139

*/
140

{
141

unsigned int hash = 0;
142

143

while (*str)
144

{
145

// equivalent to: hash = 65599*hash + (*str++);
146

hash = (*str++) + (hash << 6) + (hash << 16) - hash;
147

}
148

149

return (hash & 0x7FFFFFFF);
150

}
151

152

unsigned int APHash(const char *str)
153

/**//*
154

An algorithm produced by me Arash Partow.
155

*/
156

{
157

unsigned int hash = 0;
158

for (int i=0; *str; i++)
159

{
160

if ((i & 1) == 0)
161

{
162

hash ^= ((hash << 7) ^ (*str++) ^ (hash >> 3));
163

}
164

else
165

{
166

hash ^= (~((hash << 11) ^ (*str++) ^ (hash >> 5)));
167

}
168

}
169

170

return (hash & 0x7FFFFFFF);
171

}

转载于:https://www.cnblogs.com/czc0316/archive/2009/10/29/Hash_Function_Test.html

weixin_30949361

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
哈希函数性能评测（Hash Function Performance Test）

哈希函数性能分析表之前由于项目中用到了大规模的数据处理，使用了哈希函数作为应用，在此做了些工作将一些哈希( hash )函数的性能和冲突概率进行了测试、总结，并给出了推荐的几种较好的字符串哈希函数。哈希的目的即将原有的长字符串压缩为32位、64位、128位的哈希编码存储，以节省存储空间。而在这个过程中，起重要作用的便是哈希函数。在本实验中，采用了常见的一些哈希函数作为对比，并采...
复制链接

扫一扫