JAVA Hash性能优化

最新推荐文章于 2024-01-02 15:19:22 发布

云中漫步87

最新推荐文章于 2024-01-02 15:19:22 发布

阅读量543

点赞数

分类专栏： java 文章标签： JAVA hashmap 性能优化

本文链接：https://blog.csdn.net/wangyunzhong/article/details/84032299

版权

java 专栏收录该内容

30 篇文章 0 订阅

订阅专栏

1 问题描述

在JAVA代码中有这样一段：功能就是多个字符串拼接后，作为map的key，put到map中。

   public void hashCode(List<String> values) {
         long start2 = System.currentTimeMillis();
         for (int i = 0; i + 1 < values.size(); i += 2) {
                StringBuilder builder = new StringBuilder();
                builder.append(values.get(i));
                builder.append(values.get(i + 1));
                
         }
         Map<String, Object> map = new HashMap<>();
                    map.put(builder.toString(),new Object());
         long end2 = System.currentTimeMillis();
         System.out.println("string hash cost :" + (end2 - start2));
   }

单个运行时，代码的性能无法体现出来，但是到了千万级的调用时，将会耗费很多时间。
在我的笔记本上运行（i7 HQ，8G内存）,需要2-3s的时间跑完一千万次。从理论上来讲，耗费时间的在于字符串的拼接和hashcode的计算。为了确认问题，我们先从代码的角度找出可能出现的问题。

2 源码分析

2.1 StringBuilder构建字符串源码分析。

首先是初始化StringBuilder对象。初始化时，StringBuilder先用默认的大小（16）构建一个char数组。这里只是分配一个初始化的内存，不应该占用太多的时间。
在append的时候，如果发现申请的内存不够，将会创建一个（原大小 + append字符串长度）2大小的空间。StringBuilder会将所有的数据都拷贝到新的空间中，然后释放旧空间。
假如每次append的数据都是刚好达到当前的边界，那么空间将按照[16,172=34,35*2=70,142，…]的顺序进行扩张。每次扩张需要消耗申请空间，复制数据的时间，同时因为释放了旧空间，可能会影响gc。

public final class StringBuilder
    extends AbstractStringBuilder
    implements java.io.Serializable, CharSequence
{
    public StringBuilder() {
        super(16);
    }
}

abstract class AbstractStringBuilder implements Appendable, CharSequence {
    AbstractStringBuilder(int capacity) {
        value = new char[capacity];
    }
}

public AbstractStringBuilder append(String str) {
    if (str == null)
        return appendNull();
    int len = str.length();
    ensureCapacityInternal(count + len);
    str.getChars(0, len, value, count);
    count += len;
    return this;
}
//Arrays
public static char[] copyOf(char[] original, int newLength) {
    char[] copy = new char[newLength];
    System.arraycopy(original, 0, copy, 0,
                     Math.min(original.length, newLength));
    return copy;
}

除了内存的扩张，StringBuilder本身需要将append对象的内存拷贝到自身属性中。

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
    if (srcBegin < 0) {
        throw new StringIndexOutOfBoundsException(srcBegin);
    }
    if (srcEnd > value.length) {
        throw new StringIndexOutOfBoundsException(srcEnd);
    }
    if (srcBegin > srcEnd) {
        throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
    }
    System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}

从加载数据的维度来看，可能需要关注的点：1 数据长度超出申请内存，需要内存扩展；2 每次append的数据，都需要拷贝；3 返回String对象，需要再次进行内存拷贝，数据输出到String对象中。

2.2 hash

HashMap需要通过hashCode定位存储位置。如果存储位置已经有数据存在，则拉出一个list，顺次排放多个位置冲突的数据。
位置发生了冲突分为多种情况：1 hashCode相同，值不同，位置相同；2 hashCode相同，值相同，位置相同；3 hashCode不同，值不同，位置相同
对于第一，三种情况，数据会依次放在list中。对于第二种情况，则会覆盖之前的数据。
hashMap在put的时候，先行获得key的hashCode。在hashCode相等的情况下，会通过地址相等以及equals方法进行比对。
hash的比对逻辑代码：

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;

        .
        .
        .
}

从上面的代码可以看出，在进行put操作时，HashMap会立即计算key的hashCode，以hashCode作为寻址的条件。如果寻址发生冲突，则hashCode作为比对是否相等的首要条件。如果hashCode相等，则需要通过地址相等或者equals方法相等，来判断是否相等。
所以总的来说，需要关注两个函数：hashCode以及equals
String的hashCode算法如下。遍历char数组的每个元素，已有数据乘以31后和新的元素相加。网上说这个算法产生冲突的概率较大，但是实际过程中不会有什么差别。

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

String equals算法。遍历当前char数组和比对目标的数组，挨个char进行比较。但是没看懂的一点是：while循环采用变量n控制，但是数组元素的获取采用变量i控制。

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = value.length;
        if (n == anotherString.value.length) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = 0;
            while (n-- != 0) {
                if (v1[i] != v2[i])
                    return false;
                i++;
            }
            return true;
        }
    }
    return false;
}

3 hashkey实现

基于以上的分析，新作了一个对象，作为map的主键。
主要从内存拷贝的方面进行了优化，只进行一次copy。
hash算法上采用FNVHash算法，参考晚上的实现。

package org.yunzhong.test.stream;
import java.util.Arrays;
public class HashKey {
   private static final int HASH_PARAM = 16777619;
   private static int HASH_INIT = (int) 2166136261L;
   
   private int hashCode;
   private char[] values;
   private int count;
   public HashKey() {
         values = new char[64];
         count = 0;
   }
   public void append(String value) {
         int minLength = 0;
         if ((minLength = value.length() + count) > values.length) {
                values = Arrays.copyOf(values, minLength * 2);
         }
         value.getChars(0, value.length(), values, count);
         count += value.length();
   }
   public void hash1() {
         for (int i = 0; i < count; ++i) {
                hashCode = 31 * hashCode + values[i];
         }
   }
   public void hash() {
         hashCode = HASH_PARAM;
         for (int i = 0; i < count; ++i) {
                hashCode = (hashCode ^ values[i]) * HASH_PARAM;
         }
         hashCode += hashCode << 13;
         hashCode ^= hashCode >> 7;
         hashCode += hashCode << 3;
         hashCode ^= hashCode >> 17;
         hashCode += hashCode << 5;
   }
   @Override
   public int hashCode() {
         if(this.hashCode == 0) {
                hash();
         }
         return hashCode;
   }
   public int getHashCode() {
         return hashCode;
   }
   public void setHashCode(int hashCode) {
         this.hashCode = hashCode;
   }
   public char[] getValues() {
         return values;
   }
   public void setValues(char[] values) {
         this.values = values;
   }
   public int getEnd() {
         return count;
   }
   public void setEnd(int end) {
         this.count = end;
   }
   @Override
   public boolean equals(Object target) {
         HashKey key = (HashKey) target;
         int length = this.count;
         if (length == key.count) {
                int i = 0;
                char[] v1 = this.values;
                char[] v2 = key.values;
                while (length-- != 0) {
                       if (v1[i] != v2[i]) {
                             return false;
                       }
                       i++;
                }
                return true;
         }
         return false;
   }
   @Override
   public String toString() {
         return String.copyValueOf(this.values, 0, count);
   }
}

4 性能比对

400万数据测试。我的笔记本参数：（i7 HQ，8G内存）。
总的来说，平均时间会减少，但是终究无法达到倍数的提升。才疏学浅，只能止步于此。
StringBuilder测试用例

@Test
   public void testHashPut() {
         String[] characters = new String[] { "a", "b", "c", "d", "e", "f",  "j", "h", "i", "j", "k", "l", "m", "n", "o",
                       "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",  "z", "1", "2", "3", "4", "5", "6", "7", "8", "9" };
         Random random = new Random();
         List<String> values = Lists.newArrayList();
         for (int i = 0; i < 4000000; i++) {
                StringBuilder builder = new StringBuilder();
                for (int j = 0; j < 10; j++) {
                       int nextInt = random.nextInt(34);
                       builder.append(characters[nextInt]);
                }
                values.add(builder.toString());
         }
         long start = System.currentTimeMillis();
         Map<String, Object> map = new HashMap<String, Object>();
         for (int i = 3; i < values.size(); i++) {
                StringBuilder builder = new StringBuilder();
                builder.append(values.get(i - 3));
                builder.append(values.get(i - 2));
                builder.append(values.get(i - 1));
                builder.append(values.get(i));
                map.put(builder.toString(), new Object());
         }
         System.out.println("hash init cost:" + (System.currentTimeMillis()  - start));
   }

HashKey测试用例

@Test
   public void testHashPutOnceCopy() {
         String[] characters = new String[] { "a", "b", "c", "d", "e", "f",  "j", "h", "i", "j", "k", "l", "m", "n", "o",
                       "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",  "z", "1", "2", "3", "4", "5", "6", "7", "8", "9" };
         Random random = new Random();
         List<String> values = Lists.newArrayList();
         for (int i = 0; i < 4000000; i++) {
                StringBuilder builder = new StringBuilder();
                for (int j = 0; j < 10; j++) {
                       int nextInt = random.nextInt(34);
                       builder.append(characters[nextInt]);
                }
                values.add(builder.toString());
         }
         long start = System.currentTimeMillis();
         Map<HashKey, Object> map = new HashMap<HashKey, Object>(1000000);
         for (int i = 3; i < values.size(); i++) {
                HashKey key = new HashKey();
                key.append(values.get(i - 3));
                key.append(values.get(i - 2));
                key.append(values.get(i - 1));
                key.append(values.get(i));
                map.put(key, new Object());
         }
         System.out.println("once hash init cost:" +  (System.currentTimeMillis() - start));
   }

HashKey 2个属性

once hash init cost:7437
once hash init cost:3588
once hash init cost:3593
once hash init cost:1599
once hash init cost:4285
once hash init cost:1597
once hash init cost:1763
once hash init cost:1607
once hash init cost:1526
once hash init cost:1519

StringBuilder 2个属性

hash init cost:4588
hash init cost:2890
hash init cost:3226
hash init cost:2963
hash init cost:1743
hash init cost:1695
hash init cost:1729
hash init cost:1748
hash init cost:1641
hash init cost:1859

HashKey 4个属性

once hash init cost:7561
once hash init cost:4270
once hash init cost:3726
once hash init cost:4334
once hash init cost:4330
once hash init cost:1936
once hash init cost:1914
once hash init cost:2025
once hash init cost:1926
once hash init cost:2068

StringBuilder 4个属性

hash init cost:6841
hash init cost:3479
hash init cost:3590
hash init cost:3897
hash init cost:3676
hash init cost:4806
hash init cost:3460
hash init cost:3661
hash init cost:3512
hash init cost:3466

5 多线程

其实不想采用多线程的方式进行。多线程意味着线程间的协调，CPU资源的竞争，在系统压力大的情况下，并不能提升什么性能。
另外，初始化map只是一个很小的功能点，开启多线程有种杀鸡用牛刀的感觉。
最后，上百万的数据初始化，是很少的情况。这种情况通过1s运行，或者通过10s运行，对整体的性能来说无关紧要。
但是总的来说也是一种方案，本人也在本机进行了测试。在400万、三个字符串拼接的条件时，测试代码和数据如下：

   private ExecutorService threadPool = Executors.newFixedThreadPool(8, new  ThreadFactory() {
         private int threadNum;
         public Thread newThread(Runnable r) {
                Thread th = new Thread(r);
                th.setName("hashThread" + threadNum++);
                return th;
         }
   });
   @Test
   public void testHashPutOnceCopyMultiTrhead() throws InterruptedException,  ExecutionException {
         String[] characters = new String[] { "a", "b", "c", "d", "e", "f",  "j", "h", "i", "j", "k", "l", "m", "n", "o",
                       "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",  "z", "1", "2", "3", "4", "5", "6", "7", "8", "9" };
         int batch = 100000;
         Random random = new Random();
         final List<String> values = Lists.newArrayList();
         for (int i = 0; i < 4000000; i++) {
                StringBuilder builder = new StringBuilder();
                for (int j = 0; j < 10; j++) {
                       int nextInt = random.nextInt(34);
                       builder.append(characters[nextInt]);
                }
                values.add(builder.toString());
         }
         final Map<HashKey, Object> map = new ConcurrentHashMap<HashKey,  Object>(1000000);
         long start = System.currentTimeMillis();
         List<Future<Object>> futures = Lists.newArrayList();
         for (int j = 3; j < values.size(); j += batch) {
                final int bottom = j;
                final int top = values.size() > j + batch ? (j + batch) :  values.size();
                Future<Object> future = threadPool.submit(new  Callable<Object>() {
                       public Object call() throws Exception {
                             for (int i = bottom; i < top; i++) {
                                    HashKey key = new HashKey();
                                    key.append(values.get(i - 3));
                                    key.append(values.get(i - 2));
                                    key.append(values.get(i - 1));
                                    key.append(values.get(i));
                                    map.put(key, new Object());
                             }
                             return null;
                       }
                });
                futures.add(future);
         }
         for (Future<Object> future : futures) {
                future.get();
         }
         System.out.println("once hash init cost:" +  (System.currentTimeMillis() - start));
   }

测试数据

once hash init cost:7832
once hash init cost:3056
once hash init cost:2762
once hash init cost:3482
once hash init cost:3611
once hash init cost:3804
once hash init cost:1185
once hash init cost:1211
once hash init cost:1189
once hash init cost:1146