java 如何实现计数_如何高效的实现一个计数器map

这本是多年前一个stackoverflow上的一个讨论,回答中涉及到了多种计数方法。对于一个key-value结构的map,我们在编程时会经常涉及到key是对象,而value是一个integer或long来负责计数,从而统计多个key的频率。

面对这样一个基本需求,可能有很多种实现。比如最基本的使用jdk的map直接实现——value是一个integer或者long。其基本代码型如下:

1: final Map freq = new HashMap();2: int count = freq.containsKey(word) ? freq.get(word) : 0;3: freq.put(word, count + 1);

逻辑简单,判断是否存在,是则get取值,否则为0,再put进去一个加1后的值。总共要contain判断,get,put做三次方法调用。

当然进一步我们可以把contain判断去掉,代码如下:

1: final Map freq = new HashMap();2: Integer count = freq.get(word);3: if (count == null) {4: freq.put(word, 1);5: } else {6: freq.put(word, count + 1);7: }

一般情况,我们做到这个地步,多数人对其逻辑已经满足,简单性能也能接受,试着想一下,难道不是这样吗?get加put,解决了。

当然这样的实现还不够高效,于是我们开始去尝试实现或寻找更高效的方法,看看开源的集合类库是否有需要的:

有个Trove,可以让我们参考一下:

1: final TObjectIntHashMap freq = new TObjectIntHashMap();2: freq.adjustOrPutValue(word, 1, 1);

这样做,非常优雅啊,性能如何呢?不知道,需要看源码了解细节。那再看看大名鼎鼎的guava如何呢?

1: AtomicLongMap map = AtomicLongMap.create();2: map.getAndIncrement(word);

实现依然优雅,但是,但是看这名字,再看源码,好吧,线程安全的,支持并发,这不好搞了,我们场景需要吗?不需要的话直觉告诉我们这肯定是“慢”的。再找:

1: Multiset bag = HashMultiset.create();2: bag.add(word);

这个看上去合适了,bag的实现明显好很多,而且从语义理解上,这样的接口更容易让人理解。

那么这些方法,性能如何呢?做了个简单的比较,将26个英文字母做key,均匀循环若干次比较各个方法的效率(单纯时间效率),而且时间不统计构建开销。外加了一个线程安全版的concurrentMap实现,而其实google的guava里的AtomicLongMap也是包装了juc的concurrentMap而已。里面有最终极的MutableInt方法,找找看吧,性能最好的就是它了。

1: /**2: *3: */4:5:6: import gnu.trove.map.hash.TObjectIntHashMap;7:8: import java.util.HashMap;9: import java.util.Map;10: import java.util.concurrent.ConcurrentHashMap;11: import java.util.concurrent.ConcurrentMap;12: import java.util.concurrent.atomic.AtomicLong;13:14: import com.google.common.collect.HashMultiset;15: import com.google.common.collect.Multiset;16: import com.google.common.util.concurrent.AtomicLongMap;17:18: /**19: * @author Administrator20: *21: */22: public class IntMapTest {23:24: /**25: * @param args26: */27: public static void main(String[] args) {28: // TODO Auto-generated method stub29: int cycles[] = { 100, 1000, 10000, 100000 };30: Tester baseLine = new BaseLine();31: Tester testForNull = new UseNullTest();32: Tester useAtomicLong = new UseAtomicLong();33: Tester useTrove = new UseTrove();34: Tester useMutableInt = new UseMutableInt();35: Tester useGuava = new UseGuava();36: Tester useGuava2 = new UseGuava2();37:38: for (int i = 0; i < cycles.length; i++) {39: System.out.println("-----With " + cycles[i] + " cycles-----");40: baseLine.test(cycles[i]);41: testForNull.test(cycles[i]);42: useAtomicLong.test(cycles[i]);43: useTrove.test(cycles[i]);44: useMutableInt.test(cycles[i]);45: useGuava.test(cycles[i]);46: useGuava2.test(cycles[i]);47: System.out.println("------------------------");48: }49:50: }51:52: }53:54: abstract class Tester {55: long ms;56: static String[] strs = "abcdefghijklmnopqrstuvwxyz".split("");57:58: void pre() {59: System.out.println("====" + this.getName() + "Test Case ");60: ms = System.currentTimeMillis();61: System.out.println("start at " + ms);62: }63:64: void post() {65: ms = System.currentTimeMillis() - ms;66: System.out.println("Time used: " + ms + " ms");67: }68:69: abstract void doAction(int cycles);70:71: public void test(int cycles) {72: pre();73: doAction(cycles);74: post();75: }76:77: abstract String getName();78: }79:80: class BaseLine extends Tester {81: final Map freq = new HashMap();82:83: @Override84: void doAction(int cycles) {85: for (int i = 0; i < cycles; i++) {86: for (String word : strs) {87: int count = freq.containsKey(word) ? freq.get(word) : 0;88: freq.put(word, count + 1);89: }90: }91: }92:93: @Override94: String getName() {95: return "BaseLine";96: }97:98: }99:100: class UseNullTest extends Tester {101: final Map freq = new HashMap();102:103: @Override104: void doAction(int cycles) {105: for (int i = 0; i < cycles; i++) {106: for (String word : strs) {107: Integer count = freq.get(word);108: if (count == null) {109: freq.put(word, 1);110: } else {111: freq.put(word, count + 1);112: }113: }114: }115: }116:117: @Override118: String getName() {119: return "TestForNull";120: }121:122: }123:124: class UseAtomicLong extends Tester {125: final ConcurrentMap map = new ConcurrentHashMap();126:127: @Override128: void doAction(int cycles) {129: for (int i = 0; i < cycles; i++) {130: for (String word : strs) {131: map.putIfAbsent(word, new AtomicLong(0));132: map.get(word).incrementAndGet();133: }134: }135: }136:137: @Override138: String getName() {139: return "AtomicLong";140: }141:142: }143:144: class UseTrove extends Tester {145: final TObjectIntHashMap freq = new TObjectIntHashMap();146:147: @Override148: void doAction(int cycles) {149: for (int i = 0; i < cycles; i++) {150: for (String word : strs) {151: freq.adjustOrPutValue(word, 1, 1);152: }153: }154: }155:156: @Override157: String getName() {158: return "Trove";159: }160:161: }162:163: class MutableInt {164: int value = 1; // note that we start at 1 since we're counting165:166: public void increment() {167: ++value;168: }169:170: public int get() {171: return value;172: }173: }174:175: class UseMutableInt extends Tester {176: Map freq = new HashMap();177:178: @Override179: void doAction(int cycles) {180: for (int i = 0; i < cycles; i++) {181: for (String word : strs) {182: MutableInt count = freq.get(word);183: if (count == null) {184: freq.put(word, new MutableInt());185: } else {186: count.increment();187: }188: }189: }190: }191:192: @Override193: String getName() {194: return "MutableInt";195: }196:197: }198:199: class UseGuava extends Tester {200: AtomicLongMap map = AtomicLongMap.create();201:202: @Override203: void doAction(int cycles) {204: for (int i = 0; i < cycles; i++) {205: for (String word : strs) {206: map.getAndIncrement(word);207: }208: }209: }210:211: @Override212: String getName() {213: return "Guava AtomicLongMap";214: }215:216: }217:218: class UseGuava2 extends Tester {219: Multiset bag = HashMultiset.create();220:221: @Override222: void doAction(int cycles) {223: for (int i = 0; i < cycles; i++) {224: for (String word : strs) {225: bag.add(word);226: }227: }228: }229:230: @Override231: String getName() {232: return "Guava HashMultiSet";233: }234:235: }

输出结果如下:

1: -----With 100 cycles-----2: ====BaseLineTest Case3: start at 13586557027294: Time used: 7 ms5: ====TestForNullTest Case6: start at 13586557027367: Time used: 3 ms8: ====AtomicLongTest Case9: start at 135865570273910: Time used: 14 ms11: ====TroveTest Case12: start at 135865570275313: Time used: 2 ms14: ====MutableIntTest Case15: start at 135865570275516: Time used: 2 ms17: ====Guava AtomicLongMapTest Case18: start at 135865570275719: Time used: 4 ms20: ====Guava HashMultiSetTest Case21: start at 135865570276122: Time used: 7 ms23: ------------------------24: -----With 1000 cycles-----25: ====BaseLineTest Case26: start at 135865570276827: Time used: 17 ms28: ====TestForNullTest Case29: start at 135865570278530: Time used: 7 ms31: ====AtomicLongTest Case32: start at 135865570279233: Time used: 44 ms34: ====TroveTest Case35: start at 135865570283636: Time used: 17 ms37: ====MutableIntTest Case38: start at 135865570285339: Time used: 5 ms40: ====Guava AtomicLongMapTest Case41: start at 135865570285842: Time used: 9 ms43: ====Guava HashMultiSetTest Case44: start at 135865570286845: Time used: 50 ms46: ------------------------47: -----With 10000 cycles-----48: ====BaseLineTest Case49: start at 135865570291850: Time used: 16 ms51: ====TestForNullTest Case52: start at 135865570293453: Time used: 14 ms54: ====AtomicLongTest Case55: start at 135865570294856: Time used: 29 ms57: ====TroveTest Case58: start at 135865570297759: Time used: 10 ms60: ====MutableIntTest Case61: start at 135865570298862: Time used: 5 ms63: ====Guava AtomicLongMapTest Case64: start at 135865570299365: Time used: 15 ms66: ====Guava HashMultiSetTest Case67: start at 135865570300968: Time used: 77 ms69: ------------------------70: -----With 100000 cycles-----71: ====BaseLineTest Case72: start at 135865570308673: Time used: 124 ms74: ====TestForNullTest Case75: start at 135865570321076: Time used: 118 ms77: ====AtomicLongTest Case78: start at 135865570332979: Time used: 240 ms80: ====TroveTest Case81: start at 135865570356982: Time used: 102 ms83: ====MutableIntTest Case84: start at 135865570367185: Time used: 45 ms86: ====Guava AtomicLongMapTest Case87: start at 135865570371688: Time used: 126 ms89: ====Guava HashMultiSetTest Case90: start at 135865570384291: Time used: 98 ms92: ------------------------

一般结论:单线程使用MutableInt,多线程使用guava的AtomicLongMap,其实可以看看guava对addAndGet的实现,循环,很有趣。

最后总结一下,我们在对这个问题做优化的时候,明显的思路就是减少方法调用,而MutableInt效率最高,明显的是它将方法调用减少到最小——1次get,指针的威力顿时显现。当然实际业务代码实现的时候还要考虑到多个因素,比如代码可读性,与业务结合等等,我们现实中不一定要追求如此的效率,但是也要避免毫无思考的写下baseline里的代码,因为明显是可优化的,why not?

注:文中单个实现代码来自stackoverflow的各个回答,测试代码本人编写。

ref:

posted on 2013-01-20 12:40 changedi 阅读(4269) 评论(0)  编辑  收藏 所属分类: Java技术

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值