【C++模拟实现】哈希与unorder_set和unorder_map关联式容器的模拟实现-CSDN博客

本文链接：https://blog.csdn.net/m0_74215144/article/details/133555599

【C++模拟实现】哈希与unorder_set和unorder_map关联式容器的模拟实现

哈希概念

顺序结构以及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须要经过关键码的多次比较。顺序查找时间复杂度为O(N)，平衡树中为树的高度，即O( logN)，搜索的效率取决于搜索过程中元素的比较次数。

理想的搜索方法:可以不经过任何比较，一次直接从表中得到要搜索的元素。 如果构造一种存储结构，通过某种函数(hashFunc)****使元素的存储位置与它的关键码之间能够建立一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

插入元素

根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放

搜索元素

对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

该方式即为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称为哈希表 **(Hash Table)(**或者称散列表)

哈希的闭散列法（开放定址法）

线性探测法：

enum STATE{
    EXIST,
    EMPTY,
    DELETE
};

template<class K,class V>
struct HashDate
{
    pair<K,V> _kv;
    STATE _state=EMPTY;
 
};

template<class K>
struct HashFunc
{
    size_t operator()(const K&key)
    {
        return (size_t)key;
    }
};   

template<>
struct HashFunc<string>
{
    size_t operator()(const string&str)
    {
        size_t size=0;
        for(auto e: str)
        {
            size*=131;
            size+=e;
        }
        return size;
    }
};   


template<class K,class V,class HashDefault=HashFunc<K> >
class __HashTable
{
public:
    __HashTable()
    {
        _table.resize(10);
    }


    bool insert(const pair<K,V>& kv)
    {
        //扩容
        if(_n*10/_table.size()>=7)
        {

            size_t newsize= _table.size()*2;
            //重新映射到新表
            __HashTable<K,V> newhash;
            newhash._table.resize(newsize);

            for(int i=0;i<_table.size();i++)
            {
                newhash.insert(_table[i]._kv);
            }

            _table.swap(newhash._table);

        }
        HashDefault hf;

        size_t hashi=hf(kv.first)%_table.size();
        while(_table[hashi]._state==EXIST)
        {
            ++hashi; 
            hashi%=_table.size();
        }
        _table[hashi]._kv=kv;
        _table[hashi]._state=EXIST;
        ++_n;
        return true;
    }


    HashDate<const K,V>* find(const K&key)
    {
        HashDefault hf;
        size_t hashi=hf(key)%_table.size();
        while(_table[hashi]._state!=EMPTY)
        {
            if(_table[hashi]._state!=EMPTY&&hf(_table[hashi]._kv.first)==hf(key))
            {
                return (HashDate<const K,V>*)&_table[hashi];
            }

            ++hashi; 
            hashi%=_table.size();
        }
        return nullptr;
    }

    bool erase(const K&key)
    {
        HashDate<const K,V>* ret=find(key);
        if(ret)
        {
            ret->_state=DELETE;
            --_n;
            return true;
        }
        else{
            return false;
        }
    }

private: 
    vector<HashDate<K,V> > _table;
    size_t _n=0;//C++11
};

二次探测法：

与线性探测法类似，将插入和查找的探测元素的顺序变成hashi，hashi+1^2，hashi+22,hashi+3^2…即可，计算好的哈希地址对哈希表的总大小取模

将线性探测法中的hashi++的逻辑改成hashi=hashi+i*i;hashi %= _ht.size();即可

研究表明:当表的长度为质数且表装载因子a不超过0.5时，新的表项一定能够插入，而且任何一个位置都不会被探查两次。因此只要表中有一半的空位置，就不会存在表满的问题。在搜索时可以不考虑表装满的情况，但在插入时必须确保表的装载因子a不超过0.5，如果超出必须考虑增容。

哈希的开散列法（哈希桶）

开散列概念

开散列法又叫链地址法(开链法)，首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

因为扩容时桶的总数改变了，所以每个元素需要重新映射地址，所以有大的性能消耗。理想情况下是每个桶一个元素，可以在桶的个数等于存储的元素个数的时候进行扩容操作。（将原链表节点一个个插入到新的哈希桶）

开散列的模拟实现：

namespace Hash_bucket
{
    template<class K,class V>
    struct HashNode
    {
        HashNode(const pair<K,V>& kv)
        :_kv(kv)
        ,_next(nullptr)
        {}
    
        pair<K,V> _kv;
        HashNode<K,V>* _next;
    };


    template<class K>
    struct HashFunc
    {
        size_t operator()(const K&key)
        {
            return (size_t)key;
        }
    };   

    template<>
    struct HashFunc<string>
    {
        size_t operator()(const string&str)
        {
            size_t size=0;
            for(auto e: str)
            {
                size*=131;
                size+=e;
            }
            return size;
        }
    };   



    template<class K,class V,class DefaultHashFunc = HashFunc<K> >
    class HashTable
    {
        typedef HashNode<K,V> Node;
    public:
        HashTable()
        :_n(0)
        {
            _table.resize(10,nullptr);
        }

        ~HashTable()//不要复用erase函数，不然很繁琐
        {
            for(int i=0;i<_table.size();i++)
            {
                Node* cur=_table[i];
                while(cur)
                {
                    Node* nextnode = cur->_next;
                    delete cur;
                    cur = nextnode;
                }
                _table[i]=nullptr;
            }
            _n=0;
        }


        bool insert(const pair<K,V>& kv)
        {
            DefaultHashFunc hf;


            if(_n==_table.size())
            {
                size_t newsize=_table.size()*2;
                HashTable<K,V> _hs;
                _hs._table.resize(newsize,nullptr);

                printf("扩容成功\n");
                for(int i=0;i<_table.size();i++)//不是单纯的将整个链表牵下来
                {
                    Node* cur=_table[i];

                    while(cur)
                    {
                        size_t hashi =hf(_table[i]->_kv.first)%_hs._table.size();
                        Node* next=cur->_next;
                        
                        cur->_next=_hs._table[hashi];
                        _hs._table[hashi]=cur;

                        ++_n;
                        cur=next;

                    }
                    _table[i]=nullptr;
                }

                _table.swap(_hs._table);

            }


            size_t hashi=hf(kv.first)%_table.size();
            //进行头插
            Node* newnode=new Node(kv);
            newnode->_next=_table[hashi];
            _table[hashi]=newnode;

            ++_n;

            return true;
        }

        void Print() const
        {
            for(int i=0;i<_table.size();i++)
            {
                if(_table[i]==nullptr)
                {
                    printf("NULLPTR\n");
                }
                else{
                    Node* cur=_table[i];
                    printf("[%d]",i);
                    while(cur)
                    {
                        cout<<"->"<<cur->_kv.first;
                        cur=cur->_next;
                    }
                    cout<<endl;

                }
            }
        }

        Node* Find(const K&key) const
        {
            DefaultHashFunc hf;


            size_t hashi =hf(key)%_table.size();
            Node* cur=_table[hashi];
            while(cur)
            {
                if(hf(cur->_kv.first)==hf(key))
                {
                    return cur;
                }
                cur=cur->_next;
            }
            return nullptr;
          
        }


        bool erase(const K&key)
        {
            DefaultHashFunc hf;

            size_t hashi = hf(key)%_table.size();

            Node* prev = nullptr;
            Node* cur = _table[hashi];

            while(cur)
            {
                if(hf(cur->_kv.first)==hf(key))
                {
                    if(prev==nullptr)
                    {
                        delete cur;
                        return true;
                    }
                    else{
                        prev->_next = prev->_next->_next;
                        delete cur;
                    }
                        return true;
                }
                
                prev=cur;
                cur=cur ->_next;
            }
            return false;
        }

    private:
        vector<Node*> _table;
        size_t _n;//有效数据
    };
}

对哈希的开散列式封装unordered系列容器，加上迭代器

#include <iostream>
#include <vector>
using namespace std;

namespace Hash_bucket
{
    template<class K,class T>
    struct HashNode
    {
        HashNode(const T& data)
                :_data(data)
                ,_next(nullptr)
        {}

        T _data;
        HashNode<K,T>* _next;
    };


    template<class K>
    struct DefaultHashFunc
    {
        size_t operator()(const K&key)
        {
            return (size_t)key;
        }
    };

    template<>
    struct DefaultHashFunc<string>
    {
        size_t operator()(const string&str)
        {
            size_t size=0;
            for(auto e: str)
            {
                size*=131;
                size+=e;
            }
            return size;
        }
    };

iterator,需要将HashTable传进去
/同时迭代器和hash表存在一个相互依赖，迭代器用到hash表，hash表用到迭代器
    template<class K,class T,class KeyOfT,class HashFunc>//前置声明
    class HashTable;

//迭代器用到了HashTable中的私有成员_table,所以要在HashTable将迭代器模版声明为友元
    template<class K,class T,class Ref,class Ptr,class KeyOfT,class HashFunc>//这里不要给缺省值
    struct __HashTable_iterator
    {
        typedef HashNode<K,T> Node;
        typedef __HashTable_iterator<K,T,Ref,Ptr,KeyOfT,HashFunc> Self;
        typedef __HashTable_iterator<K,T,T&,T*,KeyOfT,HashFunc> Iterator;


        Node* _node;
        const HashTable<K,T,KeyOfT,HashFunc>* _pht;

        __HashTable_iterator(Node* node,const HashTable<K,T,KeyOfT,HashFunc>* pht)
                :_node(node)
                ,_pht(pht)
        {}


        __HashTable_iterator(const Iterator& it)
                :_node(it._node)
                ,_pht(it._pht)
        {}//如果这个对象是const迭代器，实现的是普通迭代器转为const迭代器，
        //如果这个对象是普通迭代器，实现的是普通迭代器的拷贝构造。


        Self& operator++()
        {
            if(_node->_next)
            {
                _node=_node->_next;
            }
            else{
                HashFunc hf;
                KeyOfT kot;

                size_t hashi =hf(kot(_node->_data))%_pht->_table.size();
                ++hashi;
                while (hashi<_pht->_table.size())
                {
                    if(_pht->_table[hashi])
                    {
                        _node=_pht->_table[hashi];
                        return *this;
                    }
                    else{
                        ++hashi;
                    }
                }

                _node=nullptr;

            }
            return *this;


        }

        bool operator!=(const Self& self) const
        {
            return _node!=self._node;
        }


        bool operator==(const Self& self) const
        {
            return _node==self._node;
        }

        Ptr operator->()
        {
            return &_node->_data;
        }

        Ref operator*()//T&
        {
            return _node->_data;
        }

    };



    template<class K,class T,class KeyOfT,class HashFunc = DefaultHashFunc<K> >
    class HashTable
    {
        template<class U,class I,class Y,class Z,class A,class B>//这里不要给缺省值,两个类不同，最好使用不同的模版参数
        friend struct __HashTable_iterator;
    public:
        typedef HashNode<K,T> Node;
        typedef __HashTable_iterator<K,T,T&,T*,KeyOfT,HashFunc> iterator;
        typedef __HashTable_iterator<K,T,const T&,const T*,KeyOfT,HashFunc> const_iterator;



        iterator begin()
        {
            for(int i=0;i<_table.size();i++)
            {
                Node* cur=_table[i];
                if(cur)
                {
                    return iterator(cur,this);
                }
            }
            return iterator(nullptr,this);
        }
        iterator end()
        {
            return iterator(nullptr,this);
        }


        const_iterator begin()  const
        {
            for(int i=0;i<_table.size();i++)
            {
                Node* cur=_table[i];
                if(cur)
                {
                    return iterator(cur,this);
                }
            }
            return const_iterator(nullptr,this);
        }
        const_iterator end() const
        {
            return const_iterator(nullptr,this);
        }


        HashTable()
                :_n(0)
        {
            _table.resize(10,nullptr);
        }

        ~HashTable()//不要复用erase函数，不然很繁琐
        {
            for(int i=0;i<_table.size();i++)
            {
                Node* cur=_table[i];
                while(cur)
                {
                    Node* nextnode = cur->_next;
                    delete cur;
                    cur = nextnode;
                }
                _table[i]=nullptr;
            }
            _n=0;
        }


        pair<iterator,bool> insert(const T& data)
        {
            HashFunc hf;
            KeyOfT kot;


            if(_n==_table.size())
            {
                size_t newsize=_table.size()*2;
                HashTable<K,T,KeyOfT> _hs;
                _hs._table.resize(newsize,nullptr);

                printf("扩容成功\n");
                for(int i=0;i<_table.size();i++)//不是单纯的将整个链表牵下来
                {
                    Node* cur=_table[i];

                    while(cur)
                    {
                        size_t hashi =hf(kot(_table[i]->_data))%_hs._table.size();
                        Node* next=cur->_next;

                        cur->_next=_hs._table[hashi];
                        _hs._table[hashi]=cur;

                        ++_n;
                        cur=next;

                    }
                    _table[i]=nullptr;
                }

                _table.swap(_hs._table);

            }


            size_t hashi=hf(kot(data))%_table.size();
            //进行头插
            Node* newnode=new Node(data);
            newnode->_next=_table[hashi];
            _table[hashi]=newnode;

            ++_n;

            return make_pair(iterator(newnode,this),true);
        }

        void Print() const
        {
            KeyOfT kot;
            for(int i=0;i<_table.size();i++)
            {
                if(_table[i]==nullptr)
                {
                    printf("NULLPTR\n");
                }
                else{
                    Node* cur=_table[i];
                    printf("[%d]",i);
                    while(cur)
                    {
                        cout<<"->"<<kot(cur->_data);
                        cur=cur->_next;
                    }
                    cout<<endl;

                }
            }
        }

        //库里面的返回值为迭代器
        iterator Find(const K&key) const //注意，传入的模版参数K不能省略，即使有KeyOfT，但KeyOfT不能取出类型
        {
            HashFunc hf;
            KeyOfT kot;

            size_t hashi =hf(key)%_table.size();
            Node* cur=_table[hashi];
            while(cur)
            {
                if(hf(kot(cur->_data))==hf(key))
                {
                    return iterator(cur,this);
                }
                cur=cur->_next;
            }
            return iterator(nullptr,this);

        }

        /*pair<iterator,bool> Find(const K&key) const //注意，传入的模版参数K不能省略，即使有KeyOfT，但KeyOfT不能取出类型
        {
            HashFunc hf;
            KeyOfT kot;

            size_t hashi =hf(key)%_table.size();
            Node* cur=_table[hashi];
            while(cur)
            {
                if(hf(kot(cur->_data))==hf(key))
                {
                    return make_pair(iterator(cur,this),true);
                }
                cur=cur->_next;
            }
            return make_pair(iterator(nullptr,this),false);

        }*/


        bool erase(const K& key)
        {
            HashFunc hf;
            KeyOfT kot;
            size_t hashi = hf(key) % _table.size();
            Node* prev = nullptr;
            Node* cur = _table[hashi];
            while (cur)
            {
                if (kot(cur->_data) == key)
                {
                    if (prev == nullptr)
                    {
                        _table[hashi] = cur->_next;
                    }
                    else
                    {
                        prev->_next = cur->_next;
                    }

                    delete cur;
                    return true;
                }

                prev = cur;
                cur = cur->_next;
            }

            --_n;
            return false;
        }

    private:
        vector<Node*> _table;
        size_t _n;//有效数据
    };


///unordered_set
    template<class K>
    class unordered_set
    {
        struct KeyOfT
        {
            const K& operator()(const K&key)
            {
                return key;
            }
        };
    public:
        typedef typename HashTable<K,K,KeyOfT>::const_iterator iterator;
        typedef typename HashTable<K,K,KeyOfT>::const_iterator const_iterator;

        iterator begin()
        {
            return _hs.begin();
        }

        iterator end()
        {
            return _hs.end();
        }


        const_iterator begin() const
        {
            return _hs.begin();
        }

        const_iterator end() const
        {
            return _hs.end();
        }

        pair<iterator,bool> insert(const K&key)
        {
            pair<typename HashTable<K,K,KeyOfT>::const_iterator,bool> ret= _hs.insert(key);
            return pair<iterator,bool>(ret.first,ret.second);
        }

        bool erase(const K&key)
        {
            return _hs.erase(key);
        }

        iterator Find(const K&key) const
        {
            return _hs.Find(key);
        }

        void Print() const
        {
            _hs.Print;
        }




    private:
        HashTable<K,K,KeyOfT> _hs;
    };

//unordered_map
    template<class K,class V>
    class unordered_map
    {
        struct KeyOfT
        {
            const K& operator()(const pair<const K,V>& kv)
            {
                return kv.first;
            }
        };
    public:
        typedef typename HashTable<K,pair<const K,V>,KeyOfT>::iterator iterator;
        typedef typename HashTable<K,pair<const K,V>,KeyOfT>::const_iterator const_iterator;
        iterator begin()
        {
            return _hs.begin();
        }

        iterator end()
        {
            return _hs.end();
        }

        const_iterator begin() const
        {
            return _hs.begin();
        }

        const_iterator end() const
        {
            return _hs.end();
        }




        pair<iterator,bool> insert(const pair<K,V>& kv)
        {
            return _hs.insert(kv);
        }

        bool erase(const K&key)
        {
            return _hs.erase(key);
        }

        iterator Find(const K&key) const
        {
            return _hs.Find(key);
        }

        void Print() const
        {
            _hs.Print();
        }


    private:
        HashTable<K,pair<const K,V>,KeyOfT> _hs;
    };


}

//unordered_map中vector中挂的节点并不会释放，因为是内置类型（指针），但如果是vector<list>则会释放,所以我们写的unordered_map需要手动写析构函数

//注意unordered_set和unordered_map中的模版参数第一个K最好不要加上const不然在template<class K>
//struct DefaultHashFunc中会发生报错