＜C++＞哈希表模拟实现STL_unordered_set/map

ikun66666

已于 2023-10-02 10:49:34 修改

阅读量184

点赞数 1

分类专栏： C++ # STL 文章标签： c++ stl

于 2023-10-02 00:10:04 首次发布

本文链接：https://blog.csdn.net/ikun66666/article/details/133472155

版权

C++ 同时被 2 个专栏收录

29 篇文章 1 订阅

订阅专栏

STL

10 篇文章 0 订阅

订阅专栏

哈希表模板参数的控制

首先需要明确的是，unordered_set是K模型的容器，而unordered_map是KV模型的容器。

要想只用一份哈希表代码同时封装出K模型和KV模型的容器，我们必定要对哈希表的模板参数进行控制。

为了与原哈希表的模板参数进行区分，这里将哈希表的第二个模板参数的名字改为T。

template<class K, class T>
class HashTable

如果上层使用的是unordered_set容器，那么传入哈希表的模板参数就是key和key。

template<class K>
class unordered_set {
public:
    //...
private:
    HashTable<K, K> _ht;//传入底层哈希表的是K和K
};

但如果上层使用的是unordered_map容器，那么传入哈希表的模板参数就是key以及key和value构成的键值对。

template<class K, class V>
class unordered_map {
public:
    //...
private:
    HashTable<K, pair<K, V>> _ht;//传入底层哈希表的是K以及K和V构成的键值对
};

也就是说，哈希表中的模板参数T的类型到底是什么，完全却决于上层所使用容器的种类。

在这里插入图片描述

而哈希结点的模板参数也应该由原来的K、V变为T：

上层容器是unordered_set时，传入的T是键值，哈希结点中存储的就是键值。
上层容器是unordered_map时，传入的T是键值对，哈希结点中存储的就是键值对。

更改模板参数后，哈希结点的定义如下：

template<class T>
struct HashNode {
    HashNode<T> *_next;
    T _data;

    HashNode(const T &data)
        : _data(data), _next(nullptr) {}
};

在哈希映射过程中，我们需要获得元素的键值，然后通过哈希函数计算出对应的哈希地址进行映射。

现在由于我们在哈希结点当中存储的数据类型是T，这个T可能就是一个键值，也可能是一个键值对，对于底层的哈希表来说，它并不知道哈希结点当中存储的数据究竟是什么类型，因此需要由上层容器提供一个仿函数，用于获取T类型数据当中的键值。

因此，unordered_map容器需要向底层哈希表提供一个仿函数，该仿函数返回键值对当中的键值。

template<class K, class V>
class unordered_map {
public:
    struct MapKeyOft {
        const K &operator()(const pair<K, V> &kv) {
            return kv.first;
        }
    };

private:
    HashTable<K, pair<const K, V>, MapKeyOft> _ht;
};

而虽然unordered_set容器传入哈希表的T就是键值，但是底层哈希表并不知道上层容器的种类，底层哈希表在获取键值时会统一通过传入的仿函数进行获取，因此unordered_set容器也需要向底层哈希表提供一个仿函数。

template<class K>
class unordered_set {
public:
    struct SetKeyOfT {
        const K &operator()(const K &key) {
            return key;
        }
    };

private:
    HashTable<K, K, SetKeyOfT> _ht;
};

因此，底层哈希表的模板参数现在需要增加一个，用于接收上层容器提供的仿函数。

template<class K, class T, class KeyOfT>
class HashTable

string类型无法取模问题

经过上面的分析后，我们让哈希表增加了一个模板参数，此时无论上层容器是unordered_set还是unordered_map，我们都能够通过上层容器提供的仿函数获取到元素的键值。

但是在我们日常编写的代码中，用字符串去做键值key是非常常见的事，比如我们用unordered_map容器统计水果出现的次数时，就需要用各个水果的名字作为键值。

而字符串并不是整型，也就意味着字符串不能直接用于计算哈希地址，我们需要通过某种方法将字符串转换成整型后，才能代入哈希函数计算哈希地址。

但遗憾的是，我们无法找到一种能实现字符串和整型之间一对一转换的方法，因为在计算机中，整型的大小是有限的，比如用无符号整型能存储的最大数字是4294967295，而众多字符能构成的字符串的种类却是无限的。

鉴于此，无论我们用什么方法将字符串转换成整型，都会存在哈希冲突，只是产生冲突的概率不同而已。

因此，现在我们需要在哈希表的模板参数中再增加一个仿函数，用于将键值key转换成对应的整型。

template<class K, class T, class KeyOfT, class HashFunc = Hash<K>>
class HashTable

若是上层没有传入该仿函数，我们则使用默认的仿函数，该默认仿函数直接返回键值key即可，但是用字符串作为键值key是比较常见的，因此我们可以针对string类型写一个类模板的特化

template<class K>
struct HashFunc {
    size_t operator()(const K &key) {
        return key;
    }
};

// 特化模板，传string的话，就走这个
template<>
struct HashFunc<string> {
    size_t operator()(const string &s) {
        size_t hash = 0;
        for (auto ch: s) {
            hash += ch;
            hash *= 31;
        }
        return hash;
    }
};

哈希表正向迭代器的实现

哈希表的正向迭代器实际上就是对哈希结点指针进行了封装，但是由于在实现++运算符重载时，可能需要在哈希表中去寻找下一个非空哈希桶，因此每一个正向迭代器中都应该存储哈希表的地址。

代码：

//前置声明
template<class K, class T, class KeyOft, class Hash = HashFunc<K>>
class HashTable;

template<class K, class T, class Ref, class Ptr, class KeyOft, class Hash>
struct HashIterator {
    typedef HashNode<T> Node;
    typedef HashTable<K, T, KeyOft, Hash> HT;
    //Ref和Ptr可能是T&和T*，也可能是const T&/const T*，需要创建一个支持普通转换为const的迭代器
    typedef HashIterator<K, T, Ref, Ptr, KeyOft, Hash> Self;
    typedef HashIterator<K, T, T &, T *, KeyOft, Hash> iterator;//正向迭代器

    HashIterator(Node *node, HT *ht)
        : _node(node), _ht(ht) {}

    //正向迭代器实现反向迭代器,不能只靠self，如果self传的就是const迭代器，再加上const就有问题了
    HashIterator(const iterator &it)
        : _node(it._node), _ht(it._ht) {}

    Ref operator*() {
        return _node->_data;
    }

    Ptr operator->() {
        return &_node->_data;
    }

    bool operator!=(const Self &s) {
        return _node != s._node;
    }

    bool operator==(const Self &s) {
        return _node == s._node;
    }

    Self &operator++() {
        if (_node->_next != nullptr) {
            _node = _node->_next;
        } else {
            //找下一个不为空的桶
            KeyOft kot;
            Hash hash;
            // 算出我当前的桶位置
            size_t hashi = hash(kot(_node->_data)) % _ht->_tables.size();
            ++hashi;
            while (hashi < _ht->_tables.size()) {
                if (_ht->_tables[hashi] != nullptr) {
                    _node = _ht->_tables[hashi];
                    break;
                } else {
                    ++hashi;
                }
            }

            //没有找到的话，返回_node为空
            if (hashi == _ht->_tables.size()) {
                _node = nullptr;
            }
            return *this;
        }
        return *this;
    }
    Node *_node;//迭代器指针
    HT *_ht;    //哈希表，用于定位下一个桶
};

注意： 哈希表的迭代器类型是单向迭代器，没有反向迭代器，即没有实现–运算符的重载，若是想让哈希表支持双向遍历，可以考虑将哈希桶中存储的单链表结构换为双链表结构。

正向迭代器实现后，我们需要在哈希表的实现当中进行如下操作：

进行正向迭代器类型的typedef，需要注意的是，为了让外部能够使用typedef后的正向迭代器类型iterator，我们需要在public区域进行typedef。
由于正向迭代器中++运算符重载函数在寻找下一个结点时，会访问哈希表中的成员变量_table，而_table成员变量是哈希表的私有成员，因此我们需要将正向迭代器类声明为哈希表类的友元。
将哈希表中查找函数返回的结点指针，改为返回由结点指针和哈希表地址构成的正向迭代器。
将哈希表中插入函数的返回值类型，改为由正向迭代器类型和布尔类型所构成的键值对。

完整的HashTable

#pragma once
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <utility>
#include <vector>
using namespace std;

template<class K>
struct HashFunc {
    size_t operator()(const K &key) {
        return key;
    }
};

// 特化模板，传string的话，就走这个
template<>
struct HashFunc<string> {
    size_t operator()(const string &s) {
        size_t hash = 0;
        for (auto ch: s) {
            hash += ch;
            hash *= 31;
        }
        return hash;
    }
};

template<class T>
struct HashNode {
    HashNode<T> *_next;
    T _data;

    HashNode(const T &data)
        : _data(data), _next(nullptr) {}
};


//前置声明
template<class K, class T, class KeyOft, class Hash = HashFunc<K>>
class HashTable;

template<class K, class T, class Ref, class Ptr, class KeyOft, class Hash>
struct HashIterator {
    typedef HashNode<T> Node;
    typedef HashTable<K, T, KeyOft, Hash> HT;
    //Ref和Ptr可能是T&和T*，也可能是const T&/const T*，需要创建一个支持普通转换为const的迭代器
    typedef HashIterator<K, T, Ref, Ptr, KeyOft, Hash> Self;
    typedef HashIterator<K, T, T &, T *, KeyOft, Hash> iterator;//正向迭代器

    HashIterator(Node *node, HT *ht)
        : _node(node), _ht(ht) {}

    //正向迭代器实现反向迭代器,不能只靠self，如果self传的就是const迭代器，再加上const就有问题了
    HashIterator(const iterator &it)
        : _node(it._node), _ht(it._ht) {}

    Ref operator*() {
        return _node->_data;
    }

    Ptr operator->() {
        return &_node->_data;
    }

    bool operator!=(const Self &s) {
        return _node != s._node;
    }

    bool operator==(const Self &s) {
        return _node == s._node;
    }

    Self &operator++() {
        if (_node->_next != nullptr) {
            _node = _node->_next;
        } else {
            //找下一个不为空的桶
            KeyOft kot;
            Hash hash;
            // 算出我当前的桶位置
            size_t hashi = hash(kot(_node->_data)) % _ht->_tables.size();
            ++hashi;
            while (hashi < _ht->_tables.size()) {
                if (_ht->_tables[hashi] != nullptr) {
                    _node = _ht->_tables[hashi];
                    break;
                } else {
                    ++hashi;
                }
            }

            //没有找到的话，返回_node为空
            if (hashi == _ht->_tables.size()) {
                _node = nullptr;
            }
            return *this;
        }
        return *this;
    }
    Node *_node;//迭代器指针
    HT *_ht;    //哈希表，用于定位下一个桶
};

template<class K, class T, class KeyOft, class Hash>// Hash用于将key转换成可以取模的类型
class HashTable {
public:
    typedef HashNode<T> Node;
    typedef HashIterator<K, T, T &, T *, KeyOft, Hash> iterator;
    typedef HashIterator<K, T, const T &, const T *, KeyOft, Hash> const_iterator;

    template<class K1, class T1, class Ref1, class Ptr1, class KeyOft1, class Hash1>
    friend struct HashIterator;//用于迭代器访问HashTable中的private成员变量，即_tables、

public:
    ~HashTable() {
        for (auto &cur: this->_tables) {
            while (cur) {
                Node *next = cur->_next;
                delete cur;
                cur = next;
            }
            cur = nullptr;
        }
    }

    iterator begin() {
        Node *cur = nullptr;
        for (size_t i = 0; i < _tables.size(); i++) {
            cur = _tables[i];
            if (cur != nullptr) {
                break;
            }
        }
        return iterator(cur, this);
    }

    iterator end() {
        return iterator(nullptr, this);
    }

    const_iterator begin() const {
        Node *cur = nullptr;
        for (size_t i = 0; i < _tables.size(); i++) {
            cur = _tables[i];
            if (cur != nullptr) {
                break;
            }
        }
        return const_iterator(cur, this);
    }

    const_iterator end() const {
        return const_iterator(nullptr, this);
    }

    //查找Key也是K类型
    iterator Find(const K &key) {
        if (this->_tables.size() == 0) {
            return iterator(nullptr, this);
        }

        KeyOft kot;//模板参数，用来区分是kv，还是v由上层map、set传模板参数过来(通过仿函数实现)
        Hash hash;
        size_t hashi = hash(key) % this->_tables.size();
        Node *cur = this->_tables[hashi];
        while (cur) {
            if (kot(cur->_data) == key) {
                return iterator(cur, this);
            }
            cur = cur->_next;
        }
        return iterator(nullptr, this);
    }

    //删除的值key为K类型
    bool Erase(const K &key) {
        Hash hash;
        KeyOft kot;
        size_t hashi = hash(key) % this->_tables.size();
        Node *prev = nullptr;
        Node *cur = this->_tables[hashi];
        while (cur) {
            if (kot(cur->_data) == key) {
                if (prev == nullptr) {
                    this->_tables[hashi] = cur->_next;
                } else {
                    prev->_next = cur->_next;
                }
                delete cur;
                return true;
            } else {
                prev = cur;
                cur = cur->_next;
            }
        }
        return false;
    }

    // 扩容优化，使用素数扩容
    size_t GetNextPrime(size_t prime) {
        // SGI
        static const int _stl_num_primes = 28;
        static const uint64_t _stl_prime_list[_stl_num_primes] = {
                53, 97, 193, 389, 769, 1543,
                3079, 6151, 12289, 24593, 49157, 98317,
                196613, 393241, 786433, 1572869, 3145739, 6291469,
                12582917, 25165843, 50331653, 100663319, 201326611, 402653189,
                805306457, 1610612741, 3221225473, 4294967291};

        size_t i = 0;
        for (; i < _stl_num_primes; ++i) {
            if (_stl_prime_list[i] > prime)
                return _stl_prime_list[i];
        }

        return _stl_prime_list[_stl_num_primes - 1];
    }

    //插入的类型是T类型，可能是K可能是pair<K,V> 通过模板参数传过来
    pair<iterator, bool> Insert(const T &data) {
        Hash hash;// 仿函数用于不能取模的值
        KeyOft kot;
        // 已经有这个数，就不用插入了
        iterator it = Find(kot(data));
        //如果it不是end()，说明找到了数，就不用插入，返回迭代器和false
        if (it != end()) {
            return make_pair(it, false);
        }


        // 负载因子 == 1时扩容
        if (this->n == this->_tables.size()) {
            // size_t newsize = _tables.size() == 0 ? 10 : _tables.size() * 2;
            size_t newsize = this->GetNextPrime(_tables.size());
            vector<Node *> newtables(newsize, nullptr);
            for (auto &cur: this->_tables) {// cur是Node*
                while (cur) {
                    // 保存下一个
                    Node *next = cur->_next;
                    // 头插到新表
                    size_t hashi = hash(kot(cur->_data)) % newtables.size();
                    cur->_next = newtables[hashi];
                    newtables[hashi] = cur;

                    cur = next;
                }
            }
            _tables.swap(newtables);
        }

        size_t hashi = hash(kot(data)) % this->_tables.size();
        // 头插
        Node *newnode = new Node(data);
        newnode->_next = _tables[hashi];
        _tables[hashi] = newnode;
        this->n++;

        //插入成功返回，通过newnode，和this构造迭代器，返回true。
        return make_pair(iterator(newnode, this), true);
    }

    // 获取哈希表索引最大长度(哈希桶长度)
    size_t MaxBucketSize() {
        size_t max = 0;
        for (int i = 0; i < _tables.size(); ++i) {
            auto cur = _tables[i];
            size_t size = 0;
            while (cur) {
                ++size;
                cur = cur->_next;
            }

            printf("[%d]->%d\n", i, size);

            if (size > max) {
                max = size;
            }
            if (max == 5121) {
                printf("%d", i);
                break;
            }
        }

        return max;
    }

private:
    vector<Node *> _tables;
    size_t n = 0;// 存储有效数据的个数
};

封装unordered_set的代码

#pragma once
#include "HashTable.h"

template<class K, class Hash = HashFunc<K>>
class unordered_set {
public:
    struct SetKeyOfT {
        const K &operator()(const K &key) {
            return key;
        }
    };

public:
    typedef typename HashTable<K, K, SetKeyOfT, Hash>::const_iterator iterator;
    typedef typename HashTable<K, K, SetKeyOfT, Hash>::const_iterator const_iterator;


    iterator begin() {
        return _ht.begin();
    }

    iterator end() {
        return _ht.end();
    }

    const_iterator begin() const {
        return _ht.begin();
    }

    const_iterator end() const {
        return _ht.end();
    }

    //这里的pair<iterator,bool>中的iterator是const类型的，而Insert返回的是普通迭代器
    pair<iterator, bool> insert(const K &key) {
        return _ht.Insert(key);
    }

    iterator find(const K &key) {
        return _ht.Find(key);
    }

    bool erase(const K &key) {
        return _ht.Erase(key);
    }

private:
    HashTable<K, K, SetKeyOfT, Hash> _ht;
};

封装unordered_map的代码

#pragma once

#include "HashTable.h"
template<class K, class V, class Hash = HashFunc<K>>
class unordered_map {
public:
    struct MapKeyOft {
        const K &operator()(const pair<K, V> &kv) {
            return kv.first;
        }
    };

    //typename 告诉编译器引入的是一个类型，而不是成员
    typedef typename HashTable<K, pair<const K, V>, MapKeyOft, Hash>::iterator iterator;
    typedef typename HashTable<K, pair<const K, V>, MapKeyOft, Hash>::const_iterator const_iterator;

    iterator begin() {
        return _ht.begin();
    }

    iterator end() {
        return _ht.end();
    }

    const_iterator begin() const {
        return _ht.begin();
    }

    const_iterator end() const {
        return _ht.end();
    }

    pair<iterator, bool> insert(const pair<K, V> kv) {
        return _ht.Insert(kv);
    }

    V &operator[](const K &key) {
        pair<iterator, bool> ret = insert(make_pair(key, V()));
        return ret.first->second;
    }

    iterator find(const K &key) {
        return _ht.Find(key);
    }

    bool erase(const K &key) {
        return _ht.Erase(key);
    }

private:
    HashTable<K, pair<const K, V>, MapKeyOft, Hash> _ht;
};

测试

#include "unordered_map.h"
#include "unordered_set.h"
#include <iostream>
class Date {
    friend struct HashDate;

public:
    Date(int year = 1900, int month = 1, int day = 1)
        : _year(year), _month(month), _day(day) {}

    bool operator<(const Date &d) const {
        return (_year < d._year) ||
               (_year == d._year && _month < d._month) ||
               (_year == d._year && _month == d._month && _day < d._day);
    }

    bool operator>(const Date &d) const {
        return (_year > d._year) ||
               (_year == d._year && _month > d._month) ||
               (_year == d._year && _month == d._month && _day > d._day);
    }

    bool operator==(const Date &d) const {
        return _year == d._year && _month == d._month && _day == d._day;
    }

    friend ostream &operator<<(ostream &_cout, const Date &d);

private:
    int _year;
    int _month;
    int _day;
};

ostream &operator<<(ostream &_cout, const Date &d) {
    _cout << d._year << "-" << d._month << "-" << d._day;
    return _cout;
}

//自定义Hash，模板最后一个参数，传自定义类型的话，需要自己写
struct HashDate {
    size_t operator()(const Date &d) {
        size_t hash = 0;
        hash += d._year;
        hash *= 31;

        hash += d._month;
        hash *= 31;

        hash += d._day;
        hash *= 31;

        return hash;
    }
};

struct unordered_map_Test {
    static void unordered_map_Test1() {
        unordered_map<int, int> mp;
        mp.insert(make_pair(1, 1));
        mp.insert(make_pair(2, 2));
        mp.insert(make_pair(3, 3));

        unordered_map<int, int>::iterator it = mp.begin();
        while (it != mp.end()) {
            cout << it->first << " " << it->second << endl;
            ++it;
        }
        cout << endl;
    }

    static void unordered_map_Test2() {
        string arr[] = {"西瓜", "西瓜", "苹果", "西瓜", "苹果", "苹果", "西瓜", "苹果", "香蕉", "苹果", "香蕉", "梨"};
        unordered_map<string, int> countMap;
        for (auto &e: arr) {
            countMap[e]++;
        }

        for (auto &kv: countMap) {
            cout << kv.first << "" << kv.second << endl;
        }
    }

    static void unordered_map_Test3() {
        Date d1(2023, 3, 13);
        Date d2(2023, 3, 13);
        Date d3(2023, 3, 12);
        Date d4(2023, 3, 11);
        Date d5(2023, 3, 12);
        Date d6(2023, 3, 13);

        Date a[] = {d1, d2, d3, d4, d5, d6};

        unordered_map<Date, int, HashDate> countMap;
        for (auto e: a) {
            countMap[e]++;
        }

        for (auto &kv: countMap) {
            cout << kv.first << ":" << kv.second << endl;
        }
    }
};

struct unordered_set_Test {
    static void unordered_set_Test1() {
        unordered_set<int> s;
        s.insert(1);
        s.insert(3);
        s.insert(2);
        s.insert(7);
        s.insert(8);

        unordered_set<int>::iterator it = s.begin();
        while (it != s.end()) {
            cout << *it << " ";
            //(*it) = 1;
            ++it;
        }
        cout << endl;
    }
};

int main() {
    unordered_set_Test::unordered_set_Test1();
    unordered_map_Test::unordered_map_Test1();
    unordered_map_Test::unordered_map_Test2();
    unordered_map_Test::unordered_map_Test3();
    return 0;
}