HashMap学习笔记

最新推荐文章于 2023-03-28 14:55:05 发布

Anshay

最新推荐文章于 2023-03-28 14:55:05 发布

阅读量193

点赞数 2

分类专栏： java

本文链接：https://blog.csdn.net/qq_27665897/article/details/90810402

版权

java 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

原理

hash表是一种数据结构，它使用hash函数组织数据，以支持快速插入和搜索。

其关键思想是使用hash函数将键映射到存储桶。

当我们插入一个新的键是，hash函数将决定键分配到哪一个桶中，并将该键存储仔相应的桶中。
当我们搜索一个键时，hash表使用相同的hash函数来查找对应的桶，并只在特定的桶中进行搜索。

设计hash表的关键

hash函数

hash函数是hash表中最重要的组件，该hash表的用于将键映射到特定的桶。简单举例，我们使用 y= x % 5 作为散列函数，其中x是键值，y是分配的桶的索引。
散列函数将取决与键值的范围和桶的数量。

如何设计hash函数是一个开放的问题，思想时尽可能地将键分配到桶中，理想情况瞎，完美的hash函数是键和桶之间是一对一映射，然而大多数情况瞎，hash函数并不完美，需要在桶地数量和桶的容量之间进行权衡。

冲突解决

冲突解决算法应该解决以下几个问题：

如何组织在一个桶中的值？
如果同一个桶中分配了太多的值，怎么办？
如何在特定的桶中搜索目标值？

这些问题与桶的容量和可能映射到同一个桶的键的数目有关。

假设存储最大键数的桶有N个键，如果N是常数且很小，我们可以简单地使用一个数组将键存在同一个桶中。如果N是可变的或者很大，我们可能需要使用高度平衡的二叉树来代替。

训练

插入和搜索是hash表中的两个基本操作，此外还有基于这两个操作的操作，当我们删除元素时，要先搜索元素，然后在元素存在的情况下从相应位置移除元素。

设计Hash集合

这里使用LinkedList数组来实现HashSet，并记录一个size属性。index是key%size后的索引，在单个LinkedList中，将key作为值存入，实现多个键存在一个桶里。相同的key当然是相同的值，不同的key在index一样的时候可以存进同一个桶，并且根据key区分，以实现一个桶多个键的效果。

class MyHashSet {
    private LinkedList[] lists;
    private final int size = 10000;

    /**
     * Initialize your data structure here.
     */
    public MyHashSet() {
        lists = new LinkedList[size];
    }

    public void add(int key) {
        int index = key % size;
        if (lists[index] == null) {
            lists[index] = new LinkedList();
        }
        if (!contains(key)) {
            lists[index].addFirst(key);
        }
    }

    public void remove(int key) {
        int index = key % size;
        if (lists[index] != null) {
            lists[index].remove((Integer) key);
        }
    }

    /**
     * Returns true if this set contains the specified element
     */
    public boolean contains(int key) {
        int index = key % size;
        return lists[index] != null && lists[index].contains(key);
    }
}

/**
 * Your MyHashSet object will be instantiated and called as such:
 * MyHashSet obj = new MyHashSet();
 * obj.add(key);
 * obj.remove(key);
 * boolean param_3 = obj.contains(key);
 */

设计HashMap

记录了Node数组、容量、当前大小以及负载因子。当size>=capacity * THERESHOD时扩容为原来的两倍。
为了方便理解代码，这里hash函数只是简单返回了自身，要了解更多可以查看HashMap源码的Hash方法。
这里的桶都是为了存储键，值是和键是一一对应的，只要考虑键和桶的关系就行。

class MyHashMap {
    Node[] arr;
    int capacity;
    int size;
    private static final double THERESHOD = 0.75;

    /**
     * Initialize your data structure here.
     */
    public MyHashMap() {
        capacity = 200000;
        arr = new Node[capacity];
        size = 0;
    }

    /**
     * value will always be non-negative.
     */
    public void put(int key, int value) {
        put(arr, key, value);
    }

    private void put(Node[] arr, int key, int value) {
        if (size > capacity * THERESHOD) {
            // 二倍扩容
            growCapacity();
        }
        int idx = hash(key) % capacity;
        // 使用二次hash 解决碰撞
        while (arr[idx] != null && arr[idx].key != key) {
            if (arr[idx].value == -1) {
                // 说明这个元素已经被remove了
                break;
            }
            idx = hash(idx) % capacity;
        }
        arr[idx] = new Node(key, value);
        size++;
    }

    private void growCapacity() {
        // 倍增后reHash放入即可
        capacity *= 2;
        Node[] newArr = new Node[capacity];
        reHash(newArr, arr);
        arr = newArr;
    }

    private void reHash(Node[] newArr, Node[] arr) {
        for (Node node : arr) {
            // 被删掉的应该被清除
            if (node != null && node.value != -1) {
                put(newArr, node.key, node.value);
            }
        }
    }

    /**
     * Returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key
     */
    public int get(int key) {
        int idx = getIdxByKey(key);
        return idx == -1 ? -1 : arr[idx].value;
    }

    private int getIdxByKey(int key) {
        int idx = hash(key) % capacity;
        while (arr[idx] != null && arr[idx].key != key) {
            idx = hash(idx) % capacity;
        }
        if (arr[idx] == null || arr[idx].value == -1) {
            return -1;
        }
        return idx;
    }

    private int hash(int key) {
        return Integer.hashCode(key);
    }

    /**
     * Removes the mapping of the specified value key if this map contains a mapping for the key
     */
    public void remove(int key) {
        int idx = getIdxByKey(key);
        if (idx != -1) {
            arr[idx].value = -1;
            size--;
        }
    }
}

class Node {
    int key;
    int value;

    public Node(int key, int value) {
        this.key = key;
        this.value = value;
    }
}

复杂度分析-hash表

如果有M个键，那么在使用Hash表时，很同意就达到O(M)的空间复杂度。
但是，Hash表的时间复杂度和设计有很强的联系。我没可能使用数组来将值存在同一个桶中，理想情况下，桶的大小足够小时，可以看作是一个常数。插入和搜索的时间复杂度都是O(1)。
但在最坏的情况瞎，桶大小的最大值将为N。插入时间复杂度为O(1)，搜索时为O(N)。

内置hash表的原理
内置hash表的典型设计是：

键值可以是任何 可hash化 的类型。并且属于可hash类型的值将具有hash码。此hash码将用于映射函数以获取存储区索引。
每个桶包含一个数组，用于在初始时将所有值存储在同一个桶中。
如果在同一个桶中有太多的值，这些值将被保留在一个高度平衡的二叉搜索树中。

插入和搜索的平均时间复杂度仍为O(1)。最坏情况下的插入和搜索的时间复杂度是O(logN)，使用高度平衡的BST。这是在插入和搜索之间的一种平衡。

实际使用

使用hash集合查重

简单地迭代每个值并将值插入集合中。如果值已经在哈希集中，则存在重复。

boolean findDuplicates(List<Type>& keys) {
    // Replace Type with actual type of your key
    Set<Type> hashset = new HashSet<>();
    for (Type key : keys) {
        if (hashset.contains(key)) {
            return true;
        }
        hashset.insert(key);
    }
    return false;
}

HashMap查询出现次数

目标元素作为键，出现次数作为值，每遍历一次更新值

提供更多信息

在这个例子中，如果我们只想在有解决方案时返回 true，我们可以使用哈希集合来存储迭代数组时的所有值，并检查 target - current_value 是否在哈希集合中。但是，我们被要求返回更多信息，这意味着我们不仅关心值，还关心索引。我们不仅需要存储数字作为键，还需要存储索引作为值。因此，我们应该使用哈希映射而不是哈希集合。

ReturnType aggregateByKey_hashmap(List<Type>& keys) {
    // Replace Type and InfoType with actual type of your key and value
    Map<Type, InfoType> hashmap = new HashMap<>();
    for (Type key : keys) {
        if (hashmap.containsKey(key)) {
            if (hashmap.get(key) satisfies the requirement) {
                return needed_information;
            }
        }
        // Value can be any information you needed (e.g. index)
        hashmap.put(key, value);    
    }
    return needed_information;
}

按键聚合

示例：给定一个字符串，找到它重的第一个非重复字符并返回它的索引。如果它不存在，则返回-1

解决此问题的一种简单方法是首先计算每个字符的出现次数。然后通过结果找出第一个与众不同的角色。因此，我们可以维护一个哈希映射，其键是字符，而值是相应字符的计数器。每次迭代一个字符时，我们只需将相应的值加 1。

解决此类问题的关键是在遇到现有键时确定策略。在上面的示例中，我们的策略是计算事件的数量。有时，我们可能会将所有值加起来。有时，我们可能会用最新的值替换原始值。策略取决于问题，实践将帮助您做出正确的决定。

ReturnType aggregateByKey_hashmap(List<Type>& keys) {
    // Replace Type and InfoType with actual type of your key and value
    Map<Type, InfoType> hashmap = new HashMap<>();
    for (Type key : keys) {
        if (hashmap.containsKey(key)) {
            hashmap.put(key, updated_information);
        }
        // Value can be any information you needed (e.g. index)
        hashmap.put(key, value);    
    }
    return needed_information;
}

设计键

当字符串 / 数组中每个元素的顺序不重要时，可以使用排序后的字符串 / 数组作为键。
如果只关心每个值得偏移量，通常事第一个值得偏移量，则可以使用偏移量作为键。
在树中，有时会希望使用TreeNode作为键，但在大多数情况下，采用子树得序列化（值+路径的递归路径）表述可能会更好。
在矩阵中，可以使用行索引或者列索引作为键。
在数独中，可以讲行索引和列索引组合来标识此元素属于哪个块。
有时在矩阵中，希望将值聚合在同一对角线中

致谢 —— leecode

Anshay

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap学习笔记

原理hash表是一种数据机构，它使用hash函数组织数据，以支持快速插入和搜索。其关键思想是使用hash函数将键映射到存储桶。当我们插入一个新的键是，hash函数将决定键分配到哪一个桶中，并将该键存储仔相应的桶中。当我们搜索一个键时，hash表使用相同的hash函数来查找对应的桶，并只在特定的桶中进行搜索。设计hash表的关键hash函数hash函数是hash表中最重要的组件，该...
复制链接

扫一扫

专栏目录