数据结构 13 之哈希表

最新推荐文章于 2023-04-21 22:05:13 发布

ChengZi~

最新推荐文章于 2023-04-21 22:05:13 发布

阅读量367

点赞数

分类专栏： DataStructure 文章标签：数据结构 13 之哈希表

本文链接：https://blog.csdn.net/qq_38339124/article/details/103213581

版权

DataStructure 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

数据结构 13 之哈希表

1 哈希表基础
2 哈希函数的设计
3 Java中的HashCode方法
4 哈希冲突-链地址法
5 实现哈希表
6 哈希表的动态空间处理与时间复杂度分析
7 哈希表更复杂的动态空间处理方法
8 更多哈希冲突的处理方法

1 哈希表基础

首先看leetCode上的387号问题

给定一个字符串，找到它的第一个不重复的字符，并返回它的索引。如果不存在，则返回 -1。

案例:

s = "leetcode"
返回 0.

s = "loveleetcode",
返回 2.

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/first-unique-character-in-a-string
著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

实现1：使用映射，扫描整个字符串，存储每个字符出现的频率，
		    然后再从第一个字符开始从映射中查找第一个频率为1的字符
实现2：用一个数组存储每一个字符出现的频率，例如索引为0的位置存储字符a出现的频率
			索引为1存储字符b出现的频率，以此类推

class Solution {
    public int firstUniqChar(String s) {

        int[] freq = new int[26];
        for(int i = 0 ; i < s.length() ; i ++)
            freq[s.charAt(i) - 'a'] ++;

        for(int i = 0 ; i < s.length() ; i ++)
            if(freq[s.charAt(i) - 'a'] == 1)
                return i;

        return -1;
    }
}

整个问题的背后就蕴藏着哈希表的基本原理：
int[] freq 其实就是一个哈希表

在这里插入图片描述

1、哈希表充分体现了算法设计领域的经典思想：用空间换时间，比如二分搜索树中，在存储元素的时候对数据进行一些预处理，使得真正在执行算法任务的时候，能够获得很快的速度
2、有两种极端的案例：
	比如身份证号11010819581217666,如果我们有很大的空间99999999999999999这么大的数组去存储每一个身份证号，我们可以使用O(1)的时间完成各项操作
	如果只有1的空间，只能使用O(n)时间完成各项操作（线性表）
3、哈希函数的设计是很重要的，"键"通过哈希函数得到的“索引”分布越均匀越好

2 哈希函数的设计

1、哈希函数的设计是很重要的，"键"通过哈希函数得到的“索引”分布越均匀越好
2、对于一些特殊领域，有特殊的哈希函数设计方式，甚至有专门的论文
3、作为软件开发人员，只需要了解一般的哈希函数设计规则

小范围整数直接使用
小范围负数进行偏移  -100~100  100 -200
大整数采用模一个素数来保证分布均匀
浮点型转为整数处理
字符串比较特殊

在这里插入图片描述

转成整型处理并不是唯一的方法
原则：
1、一致性：如果a==b,则hash(a) == hash(b)
2、高效性：计算高效简便
3、均匀性：哈希值均匀分布

3 Java中的HashCode方法

public class Main {

    public static void main(String[] args) {

        int a = 42;
        System.out.println(((Integer)a).hashCode());

        //值是负数，将哈希值转为数组的索引，需要在哈希表的类中完成
        //转为索引需要模一个素数，素数和
        int b = -42;
        System.out.println(((Integer)b).hashCode());

        double c = 3.1415926;
        System.out.println(((Double)c).hashCode());

        String d = "hello";
        System.out.println(d.hashCode());

        System.out.println(Integer.MAX_VALUE + 1);
        System.out.println();

        Student student1 = new Student(3, 2, "zhangsan", "Liu");
        Student student2 = new Student(3, 2, "ZHANGSAN", "Liu");
        System.out.println(student1.hashCode());
        System.out.println(student2.hashCode());
        System.out.println("--------------------");

        HashSet<Student> set = new HashSet<>();

        //存储时会自动调用Student对象的hashCode,计算出一个索引值，存储到数组的索引中
        //如果Student类没有重写hashCode方法，那么存储时会自动调用Object类中的hashCode方法
        //Object类中的hashCode方法是根据每一个对象的内存地址计算的
        //相同的一个对象（属性值一样），因为new了两次，内存地址不一样，hashCode也不一样
        set.add(student1);

        HashMap<Student, Integer> scores = new HashMap<>();
        scores.put(student1, 100);
        
    }
}

public class Student {

    int grade;
    int cls;
    String firstName;
    String lastName;

    Student(int grade, int cls, String firstName, String lastName){
        this.grade = grade;
        this.cls = cls;
        this.firstName = firstName;
        this.lastName = lastName;
    }


    /**
     * 1、不区分大小写
     * 2、有可能会产生整型溢出，但是不影响生成为一个整数
     * @return
     */
    @Override
    public int hashCode(){

        int B = 31;
        int hash = 0;
        hash = hash * B + ((Integer)grade).hashCode();
        hash = hash * B + ((Integer)cls).hashCode();
        hash = hash * B + firstName.toLowerCase().hashCode();
        hash = hash * B + lastName.toLowerCase().hashCode();
        return hash;
    }

    /**
     * 产生哈希冲突时需要依赖此方法
     * @param o
     * @return
     */
    @Override
    public boolean equals(Object o){

        if(this == o)
            return true;

        if(o == null)
            return false;

        if(getClass() != o.getClass())
            return false;

        Student another = (Student)o;
        return this.grade == another.grade &&
                this.cls == another.cls &&
                this.firstName.toLowerCase().equals(another.firstName.toLowerCase()) &&
                this.lastName.toLowerCase().equals(another.lastName.toLowerCase());
    }
}

4 哈希冲突-链地址法

链地址法并不能正确反映这种解决冲突的方法，英文名是Seperate Chaining

在这里插入图片描述

当数据量很小的时候，链表的增、删、改、查速度是更快的，如果使用红黑树的话，还可能要使用各种旋转操作
来保证满足红黑树的性质，这种操作反而更慢一些

5 实现哈希表

public class HashTable<K, V> {

    private TreeMap<K, V>[] hashtable;
    private int size;
    private int M;

    public HashTable(int M){
        this.M = M;
        size = 0;
        hashtable = new TreeMap[M];
        for(int i = 0 ; i < M ; i ++)
            hashtable[i] = new TreeMap<>();
    }

    public HashTable(){
        this(97);
    }

    private int hash(K key){
        return (key.hashCode() & 0x7fffffff) % M;
    }

    public int getSize(){
        return size;
    }

    public void add(K key, V value){
        TreeMap<K, V> map = hashtable[hash(key)];
        if(map.containsKey(key))
            map.put(key, value);
        else{
            map.put(key, value);
            size ++;
        }
    }

    public V remove(K key){
        V ret = null;
        TreeMap<K, V> map = hashtable[hash(key)];
        if(map.containsKey(key)){
            ret = map.remove(key);
            size --;
        }
        return ret;
    }

    public void set(K key, V value){
        TreeMap<K, V> map = hashtable[hash(key)];
        if(!map.containsKey(key))
            throw new IllegalArgumentException(key + " doesn't exist!");

        map.put(key, value);
    }

    public boolean contains(K key){
        return hashtable[hash(key)].containsKey(key);
    }

    public V get(K key){
        return hashtable[hash(key)].get(key);
    }
}

public class Main {

    public static void main(String[] args) {

        System.out.println("Pride and Prejudice");

        ArrayList<String> words = new ArrayList<>();
        if(FileOperation.readFile("pride-and-prejudice.txt", words)) {
            System.out.println("Total words: " + words.size());

            //退化成链表
             Collections.sort(words);

            // Test BST
            long startTime = System.nanoTime();

            BST<String, Integer> bst = new BST<>();
            for (String word : words) {
                if (bst.contains(word))
                    bst.set(word, bst.get(word) + 1);
                else
                    bst.add(word, 1);
            }

            for(String word: words)
                bst.contains(word);

            long endTime = System.nanoTime();

            double time = (endTime - startTime) / 1000000000.0;
            System.out.println("BST: " + time + " s");


            // Test AVL
            startTime = System.nanoTime();

            AVLTree<String, Integer> avl = new AVLTree<>();
            for (String word : words) {
                if (avl.contains(word))
                    avl.set(word, avl.get(word) + 1);
                else
                    avl.add(word, 1);
            }

            for(String word: words)
                avl.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;
            System.out.println("AVL: " + time + " s");


            // Test RBTree
            startTime = System.nanoTime();

            RBTree<String, Integer> rbt = new RBTree<>();
            for (String word : words) {
                if (rbt.contains(word))
                    rbt.set(word, rbt.get(word) + 1);
                else
                    rbt.add(word, 1);
            }

            for(String word: words)
                rbt.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;
            System.out.println("RBTree: " + time + " s");


            // Test HashTable
            startTime = System.nanoTime();

            // HashTable<String, Integer> ht = new HashTable<>();
            HashTable<String, Integer> ht = new HashTable<>(131071);
            for (String word : words) {
                if (ht.contains(word))
                    ht.set(word, ht.get(word) + 1);
                else
                    ht.add(word, 1);
            }

            for(String word: words)
                ht.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;
            System.out.println("HashTable: " + time + " s");
        }

        System.out.println();
    }
}

Pride and Prejudice
Total words: 125901
BST: 13.5248957 s
AVL: 0.0538591 s
RBTree: 0.0536227 s
HashTable: 0.0476948 s


Process finished with exit code 0

M的取值对哈希表的性能非常重要，如果数据量不知道多大，那么就不知道取多大的M
而M对哈希表的性能又非常重要，所以需要分析哈希表的时间复杂度

6 哈希表的动态空间处理与时间复杂度分析

在这里插入图片描述

由于数组是支持随机访问的，所以通过哈希值，查找对应的索引是O(1)的复杂度
多以各项操作的时间都花在了在某一个TreeMap中进行操作，由于TreeMap的底层是红黑树
红黑树是一种平衡的二叉树，复杂度是O(logN),所以综合来看，复杂度是O(log(N/M)),
以上分析都是平均复杂度分析，还有可能存在最坏的情况：所有的数据都存在同一个TreeMap中
这样复杂度就变为了O(logN)

通常来说，如果我们插入的数据是真实的话，那么不会产生这种最坏的情况，但是在信息安全领域有
一种非常有名的攻击方法称为哈希碰撞攻击，即了解了哈希值的计算方法后，精心设计出一套数据，
全部产生哈希冲突，让查找元素的复杂度变为O(N)的复杂度，大大拖慢整个系统 的运行速度

所以M的大小不应该是固定的，而是应该根据N的改变进行自适应的改变

在这里插入图片描述

public class HashTable<K, V> {

    private static final int upperTol = 10;
    private static final int lowerTol = 2;
    private static final int initCapacity = 7;

    private TreeMap<K, V>[] hashtable;
    private int size;
    private int M;

    public HashTable(int M){
        this.M = M;
        size = 0;
        hashtable = new TreeMap[M];
        for(int i = 0 ; i < M ; i ++)
            hashtable[i] = new TreeMap<>();
    }

    public HashTable(){
        this(initCapacity);
    }

    private int hash(K key){
        return (key.hashCode() & 0x7fffffff) % M;
    }

    public int getSize(){
        return size;
    }

    public void add(K key, V value){
        TreeMap<K, V> map = hashtable[hash(key)];
        if(map.containsKey(key))
            map.put(key, value);
        else{
            map.put(key, value);
            size ++;

            if(size >= upperTol * M)
                resize(2 * M);
        }
    }

    public V remove(K key){
        V ret = null;
        TreeMap<K, V> map = hashtable[hash(key)];
        if(map.containsKey(key)){
            ret = map.remove(key);
            size --;

            if(size < lowerTol * M && M / 2 >= initCapacity)
                resize(M / 2);
        }
        return ret;
    }

    public void set(K key, V value){
        TreeMap<K, V> map = hashtable[hash(key)];
        if(!map.containsKey(key))
            throw new IllegalArgumentException(key + " doesn't exist!");

        map.put(key, value);
    }

    public boolean contains(K key){
        return hashtable[hash(key)].containsKey(key);
    }

    public V get(K key){
        return hashtable[hash(key)].get(key);
    }

    private void resize(int newM){
        TreeMap<K, V>[] newHashTable = new TreeMap[newM];
        for(int i = 0 ; i < newM ; i ++)
            newHashTable[i] = new TreeMap<>();

        int oldM = M;
        this.M = newM;
        for(int i = 0 ; i < oldM ; i ++){
            TreeMap<K, V> map = hashtable[i];
            for(K key: map.keySet())
                newHashTable[hash(key)].put(key, map.get(key));
        }

        this.hashtable = newHashTable;
    }
}

public class Main {

    public static void main(String[] args) {

        System.out.println("Pride and Prejudice");

        ArrayList<String> words = new ArrayList<>();
        if(FileOperation.readFile("pride-and-prejudice.txt", words)) {
            System.out.println("Total words: " + words.size());

//             Collections.sort(words);

            // Test BST
            long startTime = System.nanoTime();

            BST<String, Integer> bst = new BST<>();
            for (String word : words) {
                if (bst.contains(word))
                    bst.set(word, bst.get(word) + 1);
                else
                    bst.add(word, 1);
            }

            for(String word: words)
                bst.contains(word);

            long endTime = System.nanoTime();

            double time = (endTime - startTime) / 1000000000.0;
            System.out.println("BST: " + time + " s");


            // Test AVL
            startTime = System.nanoTime();

            AVLTree<String, Integer> avl = new AVLTree<>();
            for (String word : words) {
                if (avl.contains(word))
                    avl.set(word, avl.get(word) + 1);
                else
                    avl.add(word, 1);
            }

            for(String word: words)
                avl.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;
            System.out.println("AVL: " + time + " s");


            // Test RBTree
            startTime = System.nanoTime();

            RBTree<String, Integer> rbt = new RBTree<>();
            for (String word : words) {
                if (rbt.contains(word))
                    rbt.set(word, rbt.get(word) + 1);
                else
                    rbt.add(word, 1);
            }

            for(String word: words)
                rbt.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;
            System.out.println("RBTree: " + time + " s");


            // Test HashTable
            startTime = System.nanoTime();

            HashTable<String, Integer> ht = new HashTable<>();
            //HashTable<String, Integer> ht = new HashTable<>(131071);
            for (String word : words) {
                if (ht.contains(word))
                    ht.set(word, ht.get(word) + 1);
                else
                    ht.add(word, 1);
            }

            for(String word: words)
                ht.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;
            System.out.println("HashTable: " + time + " s");
        }

        System.out.println();
    }
}

Pride and Prejudice
Total words: 125901
BST: 0.1365324 s
AVL: 0.0800731 s
RBTree: 0.076629 s
HashTable: 0.0699105 s


Process finished with exit code 0

7 哈希表更复杂的动态空间处理方法

在这里插入图片描述

public class HashTable<K, V> {

    private final int[] capacity = {53,97,193,398,796,6151,12289,24593,
            49157,98317,196613,393241,786433,1572869,3145739,6291469,
            12582917,25165843,58331653,10063319};

    private static final int upperTol = 10;
    private static final int lowerTol = 2;
    private  int capacityIndex = 0;

    private TreeMap<K, V>[] hashtable;
    private int size;
    private int M;

    public HashTable(int M){
        this.M = capacity[capacityIndex];
        size = 0;
        hashtable = new TreeMap[M];
        for(int i = 0 ; i < M ; i ++)
            hashtable[i] = new TreeMap<>();
    }


    private int hash(K key){
        return (key.hashCode() & 0x7fffffff) % M;
    }

    public void add(K key, V value){
        TreeMap<K, V> map = hashtable[hash(key)];
        // if(!hashtable[hash(key)].containsKey(key)){
        if(!map.containsKey(key)){
            map.put(key, value);
            size ++;

            if(size >= upperTol * M && capacityIndex + 1 < capacity.length){
                capacityIndex++;
                resize(capacity[capacityIndex]);
            }
        }
    }

    public V remove(K key){
        V ret = null;
        TreeMap<K, V> map = hashtable[hash(key)];
        if(map.containsKey(key)){
            ret = map.remove(key);
            size --;

            if(size <= lowerTol * M && capacityIndex - 1 >= 0) {
                capacityIndex--;
                resize(capacity[capacityIndex]);
            }

        }
        return ret;
    }

    public void set(K key, V value){
        TreeMap<K, V> map = hashtable[hash(key)];
        if(!map.containsKey(key))
            throw new IllegalArgumentException(key + " doesn't exist!");

        map.put(key, value);
    }

    public boolean contains(K key){
        return hashtable[hash(key)].containsKey(key);
    }

    public V get(K key){
        return hashtable[hash(key)].get(key);
    }

    private void resize(int newM){
        TreeMap<K, V>[] newHashTable = new TreeMap[newM];
        for(int i = 0 ; i < newM ; i ++)
            newHashTable[i] = new TreeMap<>();

        for(int i = 0 ; i < M ; i ++)
            for(K key: hashtable[i].keySet())
                newHashTable[hash(key)].put(key, hashtable[i].get(key));

        this.M = newM;
        this.hashtable = newHashTable;
    }
}

在这里插入图片描述

转成红黑树是有前提的，即K必须实现Compraible接口

8 更多哈希冲突的处理方法

除了链地址法之外，还有更多的哈希冲突的处理方法

开放地址法：即每个索引并不只是对某一个哈希值开放，而是有可能对多个哈希值开放，当产生哈希
冲突时，索引加1，再冲突索引再加1

在这里插入图片描述

以上称为线性探测法，哈希冲突的概率很大，有可能一个元素需要不停的加1才能找到能够存放自己的位置
还有一种方法称为平方探测法 +1 +4 +9 ...这样就不会出现一整片空间都被占据的方法
还有一种称为二次哈希。。，所以也需要扩容，负载率达到一定程度进行扩容，只要负载率选择的合适也能达到O(1)的复杂度，以上都成为开放地址法

rehash法

ChengZi~

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据结构 13 之哈希表

数据结构 12 之哈希表1 哈希表基础2 哈希函数的设计3 Java中的HashCode方法 1 哈希表基础首先看leetCode上的387号问题给定一个字符串，找到它的第一个不重复的字符，并返回它的索引。如果不存在，则返回 -1。案例:s = "leetcode"返回 0.s = "loveleetcode",返回 2.来源：力扣（LeetCode）链接：htt...
复制链接

扫一扫

专栏目录