java hashmap性能,Java HashMap性能优化/替代

I want to create a large HashMap but the put() performance is not good enough. Any ideas?

Other data structure suggestions are welcome but I need the lookup feature of a Java Map:

map.get(key)

In my case I want to create a map with 26 million entries. Using the standard Java HashMap the put rate becomes unbearably slow after 2-3 million insertions.

Also, does anyone know if using different hash code distributions for the keys could help?

My hashcode method:

byte[] a = new byte[2];

byte[] b = new byte[3];

...

public int hashCode() {

int hash = 503;

hash = hash * 5381 + (a[0] + a[1]);

hash = hash * 5381 + (b[0] + b[1] + b[2]);

return hash;

}

I am using the associative property of addition to ensure that equal objects have the same hashcode. The arrays are bytes with values in the range 0 - 51. Values are only used once in either array. The objects are equal if the a arrays contain the same values (in either order) and the same goes for the b array. So a = {0,1} b = {45,12,33} and a = {1,0} b = {33,45,12} are equal.

EDIT, some notes:

A few people have criticized using a hash map or other data structure to store 26 million entries. I cannot see why this would seem strange. It looks like a classic data structures and algorithms problem to me. I have 26 million items and I want to be able to quickly insert them into and look them up from a data structure: give me the data structure and algorithms.

Setting the initial capacity of the default Java HashMap to 26 million decreases the performance.

Some people have suggested using databases, in some other situations that is definitely the smart option. But I am really asking a data structures and algorithms question, a full database would be overkill and much slower than a good datastructure solution (after all the database is just software but would have communication and possibly disk overhead).

解决方案

As many people pointed out the hashCode() method was to blame. It was only generating around 20,000 codes for 26 million distinct objects. That is an average of 1,300 objects per hash bucket = very very bad. However if I turn the two arrays into a number in base 52 I am guaranteed to get a unique hash code for every object:

public int hashCode() {

// assume that both a and b are sorted

return a[0] + powerOf52(a[1], 1) + powerOf52(b[0], 2) + powerOf52(b[1], 3) + powerOf52(b[2], 4);

}

public static int powerOf52(byte b, int power) {

int result = b;

for (int i = 0; i < power; i++) {

result *= 52;

}

return result;

}

The arrays are sorted to ensure this methods fulfills the hashCode() contract that equal objects have the same hash code. Using the old method the average number of puts per second over blocks of 100,000 puts, 100,000 to 2,000,000 was:

168350.17

109409.195

81344.91

64319.023

53780.79

45931.258

39680.29

34972.676

31354.514

28343.062

25562.371

23850.695

22299.22

20998.006

19797.799

18702.951

17702.434

16832.182

16084.52

15353.083

Using the new method gives:

337837.84

337268.12

337078.66

336983.97

313873.2

317460.3

317748.5

320000.0

309704.06

310752.03

312944.5

265780.75

275540.5

264350.44

273522.97

270910.94

279008.7

276285.5

283455.16

289603.25

Much much better. The old method tailed off very quickly while the new one keeps up a good throughput.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值