Java集合学习笔记

Java集合框架详解：ArrayList、LinkedList、HashSet与Map实现

原创于 2022-08-01 23:42:25 发布 · 238 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#java #学习 #开发语言

Java 集合

Collection 接口（父接口）

Iterator 迭代器

所有实现了Iteratable接口的类都可以通过iterator()方法获取迭代器

注意：重新获取iterator即可重置迭代器；

增强 for 循环

可以对数组和集合使用；
底层使用的仍然是 iterator；
大写 I 可以快速生成代码（Idea）。

List 接口可重复-有顺序

ArrayList

线程不安全
ArrayList 维护了一个 Object 类型的数组 elementData – transient Object[] elementData // transient 表示该属性不会被序列化
两种构造方式（构造时数组已经初始化）：
1. 无参构造： ArrayList，则初始化 elementData 容量为0，第一次添加时，则扩容至默认容量10，如需再次扩容，则扩容为当前的1.5倍（1+1/2）;Vector（无参情况下）扩容倍数是2，线程安全是因为每个方法头上添加了 synchronized
2. 指定initialCapacity大小的构造器：初始 elementData 容量为指定大小，如需扩容，则直接扩容 elementData 为当前的1.5倍
每次添加元素时，都会触发一次扩容检查，容量不满足 size+1 就扩容
源码解读如下

Vector

线程安全
如果无参，默认10，满后，2倍扩容；如果指定大小，满后则每次直接2倍扩容（优先使用自定义增量capacityIncrement）
有参构造可以指定扩容大小 Vector(int initialCapacity, int capacityIncrement)
源码解读如下

====================================================
public synchronized boolean add(E e) {
    modCount++;
    ensureCapacityHelper(elementCount + 1);   // 扩容检查
    elementData[elementCount++] = e;
    return true;
}
====================================================
private void ensureCapacityHelper(int minCapacity) {
    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);    // 真正扩容方法
}
====================================================
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    //扩容的关键算法
    int newCapacity = oldCapacity + ((capacityIncrement > 0) ?
                                     capacityIncrement : oldCapacity);
    if (newCapacity - minCapacity < 0)    // 扩容后仍不满足最小capacity要求
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)    // 超过最大容量
        newCapacity = hugeCapacity(minCapacity);
    elementData = Arrays.copyOf(elementData, newCapacity);
}
====================================================

LinkedList

底层维护了一个双向链表
可以添加任意元素
其中有两个属性first和last分别指向首节点和尾结点
每个节点（Node对象），里面又维护了prev，next，item三个属性，其中通过prev指向前一个，通过next指向后一个节点，最终实现双向链表
LinkedList 增删快，查找慢
源码解读如下（尾插头删）

-----------------------------
public boolean add(E e) {
        linkLast(e);
        return true;
    }
-----------------------------
void linkLast(E e) {
        final Node<E> l = last;  // 旧尾结点
        final Node<E> newNode = new Node<>(l, e, null);  // 创建新节点
        last = newNode;     // 新节点上位尾结点
        if (l == null)
            first = newNode; // 第一个节点 first->a  last->a
        else
            l.next = newNode;  // 新节点是从尾部连接的！新节点赋值->旧尾结点.next 
        size++;
        modCount++;
    }
-----------------------------
public E remove() {
        return removeFirst();  //注意是第一个
    }
public E removeFirst() {
        final Node<E> f = first;
        if (f == null)
            throw new NoSuchElementException();
        return unlinkFirst(f);
    }
-----------------------------
private E unlinkFirst(Node<E> f) {
        // assert f == first && f != null;
        final E element = f.item;
        final Node<E> next = f.next;
        f.item = null;
        f.next = null; // help GC
        first = next;
        if (next == null)
            last = null;   // 只有一个节点
        else
            next.prev = null;  
        size--;
        modCount++;
        return element;   // 返回删除的元素
    }
-----------------------------

集合选择

ArrayList 查询快，增删慢
LinkedList查询慢，增删快
一般来说，程序中 80~90的业务都是查询，因此大部分情况下选择ArrayList
也可以根据业务需要灵活选择

Set 接口

TreeSet有序，HashSet无序
不允许重复，最多包含一个null

HashSet

如何决定元素是相同的？

hashCode() 决定节点添加到数组下标的位置；
- 真正地逻辑是：(table.length -1) & hash(hashCode())
当 hashCode() 方法算出的元素落到了某个链表上，从头到尾依次比较，有相同元素，添加失败，无相同元素，添加到链表尾部；
可以存放null，但只能有一个（null的哈希值为0）

源码分析

底层实际上是HashMap
底层调用的是Hashmap的API，value是占位符PRESENT – new Object()
元素顺序取决于hash函数的结果，是一个固定的顺序
无参构造器：default-capacity(16) loadFactor(0.75)
单链表长度达到9个（在添加第9个元素后立刻检查，这是由于binCount是之前的容量！）时才进入 treeifbin()方法
tab == null || (n = tab.length) < 64
- 64指的是HashSet中所有的元素（包括链表上的）
resize 扩容发生在以下三个时机：
- 初始化一个HashSet，第一次添加元素时，table为null，此时扩容为长度为16的数组(无参构造，有参则初始化为指定的大小向上取2^n值)
- 当前HashMap.size>threshold时，成功添加第(threshold+1)个元素时，触发扩容方法
- 链表的节点数大于8，若table.length<64，触发扩容方法；若table.length>=64，触发树化
源码解读如下
不错的帖子
- 从泊松分布谈起HashMap为什么默认扩容因子是0.75 - 知乎 (zhihu.com)
- 2022面试题：HashMap相关问题硬核梳理_小牛呼噜噜的博客-CSDN博客

// 调用链 add->put->putVal(hash->hashCode)
---------------------------------
public boolean add(E e) {
    	//PRESENT相当于一个占位符，Object[]
        return map.put(e, PRESENT)==null;
    }
---------------------------------
public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
---------------------------------
static final int hash(Object key) {
        int h;
    	// 由此可见，真正的哈希值是hashCode方法进一步包装的值
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
---------------------------------
/**------------------------核心算法------------------------**/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
    	  // 初始化 table 数组
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
    	  // tab[i]初始化
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
          	// tab[i]已经有节点了
            Node<K,V> e; K k;
            // p 是 table[i] 的第一个元素（可能是Node或TreeNode，TreeNode是HashMap的静态内部类，已树化的节点）
            // 以下代码判断是否是同一个对象
            // CASE1:Node 第一个节点的hash、equals||地址 与加入节点相同
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                // CASE2:p 是一颗红黑树
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // CASE3:Node 有多个节点，第一个节点不能匹配，遍历链表
                for (int binCount = 0; ; ++binCount) {
                    // 注意此处 p.next 赋给 e
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 本次添加过后链表元素达到了9个才进行扩容，因为binCount是之前的容量
                        if (binCount >= TREEIFY_THRESHOLD - 1) 
                            //是否要进行红黑树化判断，以下是条件，不满足执行 resize() 方法
                            // treeifyBin方法中还有判断：tab == null || (n = tab.length) < 64  
                            // 满足才能真正树化
                            treeifyBin(tab, hash);
                        break;
                    }
                    // e = p.next
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    // p 向下指一个节点
                    p = e;
                }
            }
            
            // value 替换细节
            if (e != null) {
                //此处把k-v的v替换，value是传参进来的v
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // 添加后检查，第十三个元素添加后进入if执行扩容
        if (++size > threshold)
            resize();
    	  // 为 HashMap 子类准备的方法（如LinkedList），在本类中为空实现
        afterNodeInsertion(evict);
        return null;
    }

/**-------------------数组扩容---------------------**/
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // 超过最大值就不再扩充了，就只好随你碰撞去吧
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 没超过最大值，就扩充为原来的2倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 计算新的resize上限
    if (newThr == 0) {

        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes"，"unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // 把每个bucket都移动到新的buckets中
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // 链表优化重hash的代码块
                    Node<K,V> loHead = null, loTail = null; // 原索引存放的引用
                    Node<K,V> hiHead = null, hiTail = null; // 原索引+oldCap存放的引用
                    Node<K,V> next;
                    do {
                        next = e.next;
                       /*
                       	取余(%)操作中如果除数是2的幂次则等价于与其除数减一的与(&)操作
                     	 （也就是说 hash%length==hash&(length-1)的前提是 length 是2的 n 次方；）。
                      */
                        // 原索引
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e; // 尾插
                            loTail = e; // 尾插
                        } else { // 原索引+oldCap
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 原索引放到bucket里
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 原索引+oldCap放到bucket里
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

LinkedHashSet

继承HashSet，实现Set

HashMap维护对象是 Node ，LinkedHashSet维护对象是 Entry extends HashMap.Node
底层维护了一个哈希表和双向链表
每一个节点有pre和next属性，这样可以形成双向链表
在添加一个元素时，先求hash值，再求索引，确定该元素在hashtable中的位置，然后将添加的元素加入到双向链表中（如果已经存在，则不添加，原则上通hashset一致）
这样LinkedHashSet能确保插入顺序和遍历顺序一致
源码解读

/*LinkedHashSet内部类 Entry ，将来会取代Node成为LinkedHashSet的table的节点元素*/
static class Entry<K,V> extends HashMap.Node<K,V> {
        Entry<K,V> before, after;
        Entry(int hash, K key, V value, Node<K,V> next) {
            super(hash, key, value, next);
        }
    }

TreeSet

使用无参构造器时，元素仍是无序的
TreeMap 的实现就是红黑树数据结构，也就说是一棵自平衡的排序二叉树
源码解析如下

public V put(K key, V value) {
        Entry<K,V> t = root;
    	// 第一次添加元素，注意节点对象是 Entry
        if (t == null) {
            compare(key, key); // 此处的compare是为了检查 key 是否为空值
            
            root = new Entry<>(key, value, null);
            size = 1;
            modCount++;
            return null;
        }
        int cmp;
        Entry<K,V> parent;
        // split comparator and comparable paths
        Comparator<? super K> cpr = comparator;
        if (cpr != null) {
            do {
                // 遍历所有的 key，给key找适当的位置
                parent = t;
                cmp = cpr.compare(key, t.key);  //绑定到定义的 compare 方法
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else // 发现相等的 key ，用 value 的值覆盖这个 key 的 value，且方法退出
                    return t.setValue(value);
            } while (t != null);
        }
        else {
            if (key == null)
                throw new NullPointerException();
            @SuppressWarnings("unchecked")
                Comparable<? super K> k = (Comparable<? super K>) key;
            do {
                parent = t;
                cmp = k.compareTo(t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        Entry<K,V> e = new Entry<>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;
        fixAfterInsertion(e);
        size++;
        modCount++;
        return null;
    }

Map 接口

TreeMap有序，HashMap无序
key 不允许重复(null也不能重复），value可以重复
k-v 最后是 HashMap$Node node = newNode(hash , key , value , null)
k-v 是为了方便程序员进行遍历设计的，会创建 EntrySet 集合，该集合存放的元素类型 Entry ，而一个 Entry 对象就有 k-v EntrySet<Entry<K,V>> 即： transient Set<Map.Entry<K,V>> entrySet;
entrySet 中，定义的类型是 Map.Entry , 但实际上存放的是 HashMap $N o d e, 这是因为 H a s h M a p$ Node implements Map.Entry static class Node<K,V> implements Map.Entry<K,V>
当把 HashMap$Node 对象存放到 entrySet 就方便我们的遍历，因为 Map.Entry 提供了重要方法 K getKey() – V getValue()

Map 遍历

增强 FOR
迭代器 Iterator
values() 方法，此方法返回集合 Collection ，可以使用以上两种遍历方式
entrySet() 方法，此方法返回 Set --> EntrySet<Map.Entry<K,V>> ，可以使用 1 ，2 两种方式遍历

HashMap

当添加 key-val 时，通过 key 的哈希值得到在table的索引，然后判断该索引处是否有元素，如果没有元素则直接添加，如果有元素则继续判断该元素的 key 和准备加入的 key 是否相等，如果相等，则直接替换 val；如果不相等则需要判断是树结构还是链表结构，做出相应处理，如果添加时发现容量不够，则需要扩容。
执行构造 new HashMap() ，初始化加载因子 loadfactor = 0.75 & hashMap$Node[] table = null
执行 put 调用 putVal() ，详细细节见 HashSet

Hashtable

实现了 Map 集合，即存放 k-v 键值对，key不能重复
Hashtable 的键和值都不能为 null ，否则抛出 NullPointerException
Hashtable 使用方法基本上和 HashMap 一致
Hashtable 线程安全
默认值 initialCapacity-11 loadFactor-0.75，扩容方式 2*old+1
源码解析如下

--------------------------------------------
// 无参构造 默认大小是 11 ，loadFactor仍然是 0.75，所以threshold是 11*0.75=8
public Hashtable() {
        this(11, 0.75f);
    }
--------------------------------------------
public synchronized V put(K key, V value) {
        // Make sure the value is not null
        if (value == null) {
            throw new NullPointerException();
        }

        // Makes sure the key is not already in the hashtable.
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();
        int index = (hash & 0x7FFFFFFF) % tab.length;
        @SuppressWarnings("unchecked")
        Entry<K,V> entry = (Entry<K,V>)tab[index];
        for(; entry != null ; entry = entry.next) {
            if ((entry.hash == hash) && entry.key.equals(key)) {
                V old = entry.value;
                entry.value = value;
                return old;
            }
        }

        addEntry(hash, key, value, index);
        return null;
    }
-------------------------------------------------------------
private void addEntry(int hash, K key, V value, int index) {
        modCount++;

        Entry<?,?> tab[] = table;
        if (count >= threshold) {
            // Rehash the table if the threshold is exceeded
            rehash();

            tab = table;
            hash = key.hashCode();
            index = (hash & 0x7FFFFFFF) % tab.length;
        }

        // Creates the new entry.
        @SuppressWarnings("unchecked")
        Entry<K,V> e = (Entry<K,V>) tab[index];
        tab[index] = new Entry<>(hash, key, value, e);
        count++;
    }
-------------------------------------------------------------
protected void rehash() {
        int oldCapacity = table.length;
        Entry<?,?>[] oldMap = table;

        //扩容机制如下 2*oldCapacity+1
        int newCapacity = (oldCapacity << 1) + 1;
        if (newCapacity - MAX_ARRAY_SIZE > 0) {
            if (oldCapacity == MAX_ARRAY_SIZE)
                // Keep running with MAX_ARRAY_SIZE buckets
                return;
            newCapacity = MAX_ARRAY_SIZE;
        }
    	//数组扩容
        Entry<?,?>[] newMap = new Entry<?,?>[newCapacity];

        modCount++;
        threshold = (int)Math.min(newCapacity * loadFactor, MAX_ARRAY_SIZE + 1);
        table = newMap;

        for (int i = oldCapacity ; i-- > 0 ;) {
            for (Entry<K,V> old = (Entry<K,V>)oldMap[i] ; old != null ; ) {
                Entry<K,V> e = old;
                old = old.next;

                int index = (e.hash & 0x7FFFFFFF) % newCapacity;
                e.next = (Entry<K,V>)newMap[index];
                newMap[index] = e;
            }
        }
    }

Properties

继承自 Hashtable，仍然以 k-v 键值对保存数据
使用方式与 Hashtable 类似
Properties可以从 xxx.properties文件中，加载数据到其创建的对象中，并对其修改

TreeMap

这个类不依赖hashCode和equals

使用比较器构造器

public TreeSet(Comparator<? super E> comparator) {
        this(new TreeMap<>(comparator));
    }

第一次添加，把k-v封装到 Entry 对象，放入 root

Entry<K,V> t = root;
if (t == null) {
    compare(key, key); // type (and possibly null) check

    root = new Entry<>(key, value, null);
    size = 1;
    modCount++;
    return null;
}

以后添加

int cmp;
Entry<K,V> parent;
// split comparator and comparable paths
Comparator<? super K> cpr = comparator;
if (cpr != null) {
    do {  // 遍历所有key，给key找适当的位置
        parent = t;
        cmp = cpr.compare(key, t.key); //调用的是传入的比较器
        if (cmp < 0)
            t = t.left;
        else if (cmp > 0)
            t = t.right;
        else
          	// 发现已经有重复的key，覆盖value并返回
            return t.setValue(value);
    } while (t != null);
}

集合选择

先判断存储的类型（一组对象或一组键值对）
一组对象：Collection接口实现类
1. 允许重复：List
  - 增删多：LinkedList（底层维护了一个双向链表）
  - 改查多：ArrayList（底层维护Object类型可变数组）
2. 不允许重复：Set
  - 无序：HashSet（底层是HashMap，维护了一个哈希表，即数组+链表+红黑树）
  - 排序：TreeSet
  - 插入和取出顺序一致：LinkedHashSet（底层维护了数组+双向链表）
一组键值对：Map
- 键无序：HashMap（底层是哈希表）
- 键排序：TreeMap
- 键插入和取出顺序一致：LinkedHashMap
- 文件操作：Properties

Collections 工具类

排序相关
- reverse() // 反转
- shuffle() // 乱序
- sort() // 排序，可以定义比较器
- swap() // 交换
查找、替换
- max() // 可以定义比较器
- frequency() // 某元素出现频率
- copy() // 注意数组越界问题！
- replaceAll() // 集合中某元素替换