java hashmap性能,Java HashMap性能优化/替代

最新推荐文章于 2022-08-05 17:01:26 发布

weixin_39789690

最新推荐文章于 2022-08-05 17:01:26 发布

阅读量99

点赞数

文章标签： java hashmap性能

I want to create a large HashMap but the put() performance is not good enough. Any ideas?

Other data structure suggestions are welcome but I need the lookup feature of a Java Map:

map.get(key)

In my case I want to create a map with 26 million entries. Using the standard Java HashMap the put rate becomes unbearably slow after 2-3 million insertions.

Also, does anyone know if using different hash code distributions for the keys could help?

My hashcode method:

byte[] a = new byte[2];

byte[] b = new byte[3];

...

public int hashCode() {

int hash = 503;

hash = hash * 5381 + (a[0] + a[1]);

hash = hash * 5381 + (b[0] + b[1] + b[2]);

return hash;

}

I am using the associative property of addition to ensure that equal objects have the same hashcode. The arrays are bytes with values in the range 0 - 51. Values are only used once in either array. The objects are equal if the a arrays contain the same values (in either order) and the same goes for the b array. So a = {0,1} b = {45,12,33} and a = {1,0} b = {33,45,12} are equal.

EDIT, some notes:

A few people have criticized using a hash map or other data structure to store 26 million entries. I cannot see why this would seem strange. It looks like a classic data structures and algorithms problem to me. I have 26 million items and I want to be able to quickly insert them into and look them up from a data structure: give me the data structure and algorithms.

Setting the initial capacity of the default Java HashMap to 26 million decreases the performance.

Some people have suggested using databases, in some other situations that is definitely the smart option. But I am really asking a data structures and algorithms question, a full database would be overkill and much slower than a good datastructure solution (after all the database is just software but would have communication and possibly disk overhead).

解决方案

As many people pointed out the hashCode() method was to blame. It was only generating around 20,000 codes for 26 million distinct objects. That is an average of 1,300 objects per hash bucket = very very bad. However if I turn the two arrays into a number in base 52 I am guaranteed to get a unique hash code for every object:

public int hashCode() {

// assume that both a and b are sorted

return a[0] + powerOf52(a[1], 1) + powerOf52(b[0], 2) + powerOf52(b[1], 3) + powerOf52(b[2], 4);

}

public static int powerOf52(byte b, int power) {

int result = b;

for (int i = 0; i < power; i++) {

result *= 52;

}

return result;

}

The arrays are sorted to ensure this methods fulfills the hashCode() contract that equal objects have the same hash code. Using the old method the average number of puts per second over blocks of 100,000 puts, 100,000 to 2,000,000 was:

168350.17

109409.195

81344.91

64319.023

53780.79

45931.258

39680.29

34972.676

31354.514

28343.062

25562.371

23850.695

22299.22

20998.006

19797.799

18702.951

17702.434

16832.182

16084.52

15353.083

Using the new method gives:

337837.84

337268.12

337078.66

336983.97

313873.2

317460.3

317748.5

320000.0

309704.06

310752.03

312944.5

265780.75

275540.5

264350.44

273522.97

270910.94

279008.7

276285.5

283455.16

289603.25

Much much better. The old method tailed off very quickly while the new one keeps up a good throughput.

weixin_39789690

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java hashmap性能,Java HashMap性能优化/替代

I want to create a large HashMap but the put() performance is not good enough. Any ideas?Other data structure suggestions are welcome but I need the lookup feature of a Java Map:map.get(key)In my case...
复制链接

扫一扫