NO.2 HashMap浅谈

最新推荐文章于 2022-06-25 12:13:37 发布

零蚀zero eclipse

最新推荐文章于 2022-06-25 12:13:37 发布

阅读量202

点赞数

分类专栏： Java

本文链接：https://blog.csdn.net/qq_38315348/article/details/107214901

版权

Java 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

零蚀

前言

数组和链表
- 我们面试时，应该都有可能被问到，数组和链表的差异是什么，很多的标准答案是：链表的查询速度慢，增删速度快；数组的查询速度快，增删速度慢，但是这为什么呢，其实这现象，是由于他们在内存空间的排列方式导致的：
  - 数组：它在内存中他的区间顺序是连续的，他必须要占一整块能满足他所有数据的空间，才能安放自己庞大的身躯，他对存储空顺序有一定的要求。但是由于这种在内存中的存储方式，所以每次我们要定位某个位置的时候，就很方便，因为只要从初识坐标+index*单位长度，就可以找到index的位置（对应的单元），但是这种方式，也带来一个麻烦，就是为了保证index的稳定性，删除元素可能会造成数组元素的重新排列，这就导致了数组的增删慢，查询快的特性。
  - 链表：它其实是一个链表，他的特点和数组不同，他不需要像数组一样被封建的index所限制自己所在的空间，他们是自由的，没有三六九等的顺序，他们是离散存在空间中的，不需要一整块满足自身链表体积的空间，他只要有一个小空间能塞一个元素进去，这样就可以了，然后留下个单线传播的联系方式（指针），这样链接下去，形成链表，所以链表相对于数组，对内存更为友好，由于这种特性，他里面没有用地址为参考的index，你要找第几个，只能从头开始遍历它。因为你永远不知道下一个元素会出现在什么位置，而增的时候，不需要让他空间位置满足index的限制，只要有空闲即可；删除只要将pre的尾指针指向改变就行。所以链表就查询就慢了，增删快了。

hashcode

在了解这些之前，我们先要了解一下，Hash到底是什么，我们一直用HashMap但是我们一直不知道HashCode是什么，每个对象里面都有自己的hashcode(),这个hashcode就像是他们的id一样，String 类型的hash计算是这样的s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1],如果你的String是一个英文字符的话，那就是它的ASIIC码了。
对象的hashcode直接是public native int hashCode()估计逻辑就繁琐了，所以我们看源码:

//java.lang.Object 

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
    JVMWrapper("JVM_IHashCode");
    // as implemented in the classic virtual machine; return 0 if object is NULL
    return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

通过jvm.cpp我们找到了hashCode的调用，然后我们继续看看ObjectSynchronizer的具体实现内容

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
  if (UseBiasedLocking) {
    // NOTE: many places throughout the JVM do not expect a safepoint
    // to be taken here, in particular most operations on perm gen
    // objects. However, we only ever bias Java instances and all of
    // the call sites of identity_hash that might revoke biases have
    // been checked to make sure they can handle a safepoint. The
    // added check of the bias pattern is to avoid useless calls to
    // thread-local storage.
    if (obj->mark()->has_bias_pattern()) {
      // Box and unbox the raw reference just in case we cause a STW safepoint.
      Handle hobj (Self, obj) ;
      // Relaxing assertion for bug 6320749.
      assert (Universe::verify_in_progress() ||
              !SafepointSynchronize::is_at_safepoint(),
             "biases should not be seen by VM thread here");
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj() ;
      assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
    }
  }

  // hashCode() is a heap mutator ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;
  assert (Universe::verify_in_progress() ||
          Self->is_Java_thread() , "invariant") ;
  assert (Universe::verify_in_progress() ||
         ((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark (obj);

  // object should remain ineligible for biased locking
  assert (!mark->has_bias_pattern(), "invariant") ;

  if (mark->is_neutral()) {
    hash = mark->hash();              // this is a normal header
    if (hash) {                       // if it has hash, just return it
      return hash;
    }
    hash = get_next_hash(Self, obj);  // allocate a new hash code
    temp = mark->copy_set_hash(hash); // merge the hash code into header
    // use (machine word version) atomic operation to install the hash
    test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
    if (test == mark) {
      return hash;
    }
    // If atomic operation failed, we must inflate the header
    // into heavy weight monitor. We could add more code here
    // for fast path, but it does not worth the complexity.
  } else if (mark->has_monitor()) {
    monitor = mark->monitor();
    temp = monitor->header();
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();
    if (hash) {
      return hash;
    }
    // Skip to the following code to reduce code size
  } else if (Self->is_lock_owned((address)mark->locker())) {
    temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();              // by current thread, check if the displaced
    if (hash) {                       // header contains hash code
      return hash;
    }
    // WARNING:
    //   The displaced header is strictly immutable.
    // It can NOT be changed in ANY cases. So we have
    // to inflate the header into heavyweight monitor
    // even the current thread owns the lock. The reason
    // is the BasicLock (stack slot) will be asynchronously
    // read by other threads during the inflate() function.
    // Any change to stack may not propagate to other threads
    // correctly.
  }

  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ;
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ;
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ;
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}

在往下追踪hash方法，在markOop.cpp下找到对应的方法，这里的hash_shift，是一个定义在这个文件里的enum量，它是经过很多的参数运算得到的结果，hash_mask同理。

// hash operations
  intptr_t hash() const {
    return mask_bits(value() >> hash_shift, hash_mask);
  }

最后的关键就是 mask_bits（）,其实他就是将两个参数进行&运算，经过这么多我们也就知道hash底层也就是对数据进行一定的位运算。

//globalDefinitions.hpp
inline intptr_t mask_bits 
(intptr_t  x, intptr_t m) { return x & m; }

Map原理

如何增删改查
- 在上面我们知道数组的增删慢，查询快，链表反之，但是在Map中，为了同时照顾到增删改查四种操作，所以采用了数组和链表的结合方式，就是将Key-value对象保存在数组中，然后以链表的方式将哈希运算后的定位相同且key不相同的的对象串成一个链表。这样就做到了对数组和链表的一种折中的方案。

源码

创建对象

当我们创建一个HashMap()的时候，它的构造函数里面会默认一个loadFactory，这个“装载因子”被赋予了默认值0.75，这个“装载因子”的作用是记录这个容器装载的比例，将所有的列表的key-value对象数量/容量的总体的上限阙值就是它的作用。

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

在这里除了“装填因子”外我们还看到了一个threshold参数，tableSizeFor在源码里主要是将其容量不足2次幂的部分进行补齐，所有的容量都以2次幂为准。这里的threshold注释解释是它等于容量*装填因子，也就是它是map中，数量的阙值。

增

然后我们看一下添加数据单元的功能put，这里put的单元是一个Node对象，它是实现了Map里面的Entry的接口，他内容也很简单，就是将这个对象的值进行赋值给自己的value，这里的数组是一个Node类型的数组（参数table/tab），然后将table的length和hash进行&运算，就能得倒这个数据对应的数组坐标，如果数组这个坐标是空的，就直接将对象插入对应的位置，如果这个位置不为空，他会先判断一下，这是不是插入了一个红黑树的节点，然后如果还没有达到成为红黑树的条件的话，将这个节点以链表形式插在末尾。

if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        // 这里判断了一下当前的数组为空，则插入一个Node对象。
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            // 这里判断了一下插入的对象是不是红黑树类型对象，如果是则插入红黑树
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 确定是一个链表格式，以链表格式进行插入尾端。
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
            .....

删

在说删除之前说一个有意思的entrySet()方法，它返回了set集合，这也就是为什么我们的HashMap是一个有Set特性的容器了，底层就是一个set集合，这个方法在构造HashMap（Map m)中被使用。我们其实对hashMap的操作，也会映射到这个Set上。所以我们上面的put方法，也是遵照着set的添加方式来添加元素。
删除的代码其实不是很多，就是将Map中的set集合进行迭代器遍历，然后遍历到对应的元素，进行删除。

public V remove(Object key) {
    Iterator<Entry<K,V>> i = entrySet().iterator();
    Entry<K,V> correctEntry = null;
    if (key==null) {
        while (correctEntry==null && i.hasNext()) {
            Entry<K,V> e = i.next();
            if (e.getKey()==null)
                correctEntry = e;
        }
    } else {
        while (correctEntry==null && i.hasNext()) {
            Entry<K,V> e = i.next();
            if (key.equals(e.getKey()))
                correctEntry = e;
        }
    }

    V oldValue = null;
    if (correctEntry !=null) {
        oldValue = correctEntry.getValue();
        i.remove();
    }
    return oldValue;
}

改查并没有什么好说的，关于和HashMap的内容暂时就这么多。

补充

本来想手撕一下红黑树，但是后来耽搁了，虽然自己看了一些，但是就不撕了，其实红黑树是一种平衡二叉树，这种树有什么好处呢，其实这种树看着比较“丰满”，这样在相同的节点情况下，减少树的高度，当我们遍历下去的时候，可以有效节省单次的路径长度。这里就不举例说明了，其实百度一下就明白了。

🔗 前言
🔗 Java 补完计划
🔗 NO.1 Java 接口&注解

零蚀zero eclipse

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
NO.2 HashMap浅谈

零蚀前言数组和链表我们面试时，应该都有可能被问到，数组和链表的差异是什么，很多的标准答案时时数组查询快，链表的查询速度慢，增删速度快，数组的查询速度快，增删速度慢，但是为什么，其实这是由于他们在内存空间的排列方式导致的：**数组：**它在内存中他的区间书讯是连续的，他必须要占一整块能满足他所有数据的空间，才能满足自己的需要，他对存储空间时有要求的。但是由于这种方式，所以每次我们要序号他的时候，就很方便，因为只要从初识坐标+index*单位长度，就可以找到index的位置，但是这种方.
复制链接

扫一扫