STL源码剖析 hashtable

最新推荐文章于 2023-07-19 15:45:18 发布

MY CUP OF TEA

最新推荐文章于 2023-07-19 15:45:18 发布

阅读量995

点赞数

本文链接：https://blog.csdn.net/CHYabc123456hh/article/details/121539649

版权

C++ 同时被 3 个专栏收录

248 篇文章 27 订阅

订阅专栏

C++ vector

53 篇文章 1 订阅

订阅专栏

STL

31 篇文章 2 订阅

订阅专栏

二叉搜索树具有对数平均时间的表现，但是这个需要满足的假设前提是输入的数据需要具备随机性
hashtable 散列表这种结构在插入、删除、搜寻等操作层面上也具有常数平均时间的表现。而且不需要依赖元素的随机性，这种表现是以统计为基础的

hashtable的概述

hashtable可提供对任何有名项的存取和删除操作
因为操作的对象是有名项，因此hashtable可以作为一种字典结构
将一个元素映射成为一个 “大小可以接受的索引”简称为hash function散列函数
考虑到元素的个数大于array的容量，可能有不同的元素被映射到相同的位置，简称为碰撞
解决碰撞的方法有很多，线性探测、二次探测、开链

线性探测

负载系数：元素的个数除以表格的大小，负载系数介于0-1，除非使用开链法
使用线性探测时，根据散列函数计算得到的位置已经存在了元素，就需要循环往下一一寻找，如果到达array的尾端，就需要绕回到头部继续寻找，直到找到一个可用的空间为止。
元素的搜寻也是类似
元素的删除采用惰性机制，只标记删除的记号，实际真正的删除操作需要等待表格重新整理时才可以进行

需要两个假设：1，表格足够大；2，每个元素都够独立 (如果所有元素通过散列函数计算都得到相同的位置，造成了平均插入成本的厂长速度远远高于了负载系数的成长速度)

二次探测

F(i) = i^2,如果计算得到新元素的位置是H，但是这个位置已经被占用了，将会依序尝试 H+1^2 H+2^2 H+3^2 等等，而不是H+1 H+2

如果将表格的大小设定为质数，保持负载系数低于0.5，那么没插入一个元素所需要的探测次数不多于 2

开链

每一个表格元素维护一个list，然后对list进行元素的插入删除等操作
hashtable使用开链法

hashtable的桶子和节点

hashtable表格内的元素为桶子，名称的含义是表格内的每个单元涵盖的不只是个节点，甚至是一桶节点

template <class Value>
struct __hashtable_node{
    __hashtable_node* next;
    Value val;
};

bucket使用的linked list，不是采用stl源码中的list slist ，而是自行维护上述的hash table node
buckets聚合体则以vector完成，从而具备了扩充的能力

hashtable迭代器

hashable迭代器维持着与整个buckets vector的关系，并记录目前所指的节点
前进操作是从目前节点出发前进一个位置，由于节点被安置于list内，使用next进行前进操作
如果目前是list的尾端，则跳转至下一个bucket上，正是指向下一个list的头部
一篇足矣，带你吃透STL源码中hash table(哈希表)与关联式容器hash_set、hash_map_董哥的黑板报-CSDN博客
hashtable的迭代器没有后退操作，hashtable没有定义所谓的逆向迭代器

hashtable的数据结构

buckets聚合体以vector完成，以利动态扩充
<stl_hash_fun.h>定义数个现成的hash functions 全都是仿函数，hash function计算单元的位置，也就是元素对应的bucket的位置，具体调用的函数是bkt_num(),它调用hash function取得一个可以执行modulus(取模)运算的数值
按照质数设计vector的大小，事先准备好28个质数，并设计一个函数用于查询最接近某数并大于某数的质数

hashtable的构造和内存管理

vector的reserve的使用（避免内存重新分配以及内存分配的方式）_Zero's Zone-CSDN博客

判断元素落在哪一个bucket内？这是hash function的任务，但是SGI STL对其进行了封装先交给bkt_num()函数再由此函数调用hash function，得到一个可以执行的modules(取模)运算的数值
以上的目的是出于有些元素的型别是无法直接对其进行取模运算的，比如字符串类型

    //版本1：接受实值（value）和buckets个数
    size_type bkt_num(const value_type& obj, size_t n) const
    {
        return bkt_num_key(get_key(obj), n); //调用版本4
    }

//版本2：只接受实值（value）
    size_type bkt_num(const value_type& obj) const
    {
        return bkt_num_key(get_key(obj)); //调用版本3
    }

//版本3，只接受键值
    size_type bkt_num_key(const key_type& key) const
    {
        return bkt_num_key(key, buckets.size()); //调用版本4
    }

//版本4：接受键值和buckets个数
    size_type bkt_num_key(const key_type& key, size_t n) const
    {
        return hash(key) % n; //SGI的所有内建的hash()，在后面的hash functions中介绍
    }

复制和整体删除

hash table是由vector和linked list组合而成的，因此复制和整体删除都需要注意内存的释放的问题

    void clear(){
        //针对每一个bucket
        for(size_type i = 0;i < buckets.size();++i){
            node * cur = buckets[i];
            //删除bucket list中的每一个节点
            while(cur != 0){
                node* next = cur->next;
                delete_node(cur);
                cur = next;
            }
            buckets[i] = 0; //令buckets内容为null
        }
        num_elements = 0; //令总的节点的个数为0
        //需要注意 buckets vector并没有释放空间，仍然保存先前的大小
    }

    void copy_from(const hashtable& ht){
        //先清除己方的buckets vector，此操作是调用vector::clear() 造成所有的元素都为0
        buckets.clear();
        //为己方的buckets vector保留空间，使与对方相同
        //如果己方的空间大于对方 就不需要改变；如果己方的空间小于对方 就会增大
        buckets.reserve(ht.buckets.size());
        //从己方的buckets vector尾端开始，插入n个元素，其数值为 null 指针
        //注意此时buckets vector为空，所谓的尾端就是起头处
        buckets.insert(buckets.end(),ht.buckets.size(),(node*)0);
        __STL_TRY{
            //针对buckets vector
            for (size_type i = 0;i<ht.buckets.size();++i) {
                //复制vector的每一个元素(是一个指针，指向hashtable节点)
                if (const node* cur = ht.buckets[i]){
                    node* copy = new_node(cur->val);
                    buckets[i] = copy;
                    //针对同一个 buckets list 复制每一个节点
                    for (node* next = cur->next;next ; cur = next,next = cur->next) {
                        copy->next = new_node(next->val);
                        copy = copy->next;
                    }
                }
            }
            //重新登录的节点的个数(hashtable的大小)
            num_elements = ht.num_elements;
        };
        __STL_UNWIND(clear());
    }

整体代码

#include <iostream>
#include <vector>

#ifdef __STL_USE_EXCEPTIONS
#define __STL_TRY   try
#define __STL_UNWIND(action)   catch(...) { action; throw; }
#else
#define __STL_TRY
#define __STL_UNWIND(action)
#endif

template<class T,class Alloc>
class simple_alloc{
public:
    static T* allocate(std::size_t n){
        return 0==n?0:(T*)Alloc::allocate(n * sizeof(T));
    }
    static T* allocate(void){
        return (T*)Alloc::allocate(sizeof (T));
    }

    static void deallocate(T* p,size_t n){
        if (n!=0){
            Alloc::deallocate(p,n * sizeof(T));
        }
    }
    static void deallocate(T* p){
        Alloc::deallocate(p,sizeof(T));
    }
};

namespace Chy{
    template <class T>
    inline T* _allocate(ptrdiff_t size,T*){
        std::set_new_handler(0);
        T* tmp = (T*)(::operator new((std::size_t)(size * sizeof (T))));
        if (tmp == 0){
            std::cerr << "out of memory" << std::endl;
            exit(1);
        }
        return tmp;
    }

    template<class T>
    inline void _deallocate(T* buffer){
        ::operator delete (buffer);
    }

    template<class T1,class T2>
    inline void _construct(T1 *p,const T2& value){
        new(p) T1 (value);  //没看懂
    }

    template <class T>
    inline void _destroy(T* ptr){
        ptr->~T();
    }

    template <class T>
    class allocator{
    public:
        typedef T           value_type;
        typedef T*          pointer;
        typedef const T*    const_pointer;
        typedef T&          reference;
        typedef const T&    const_reference;
        typedef std::size_t size_type;
        typedef ptrdiff_t   difference_type;

        template<class U>
        struct rebind{
            typedef allocator<U>other;
        };

        pointer allocate(size_type n,const void * hint = 0){
            return _allocate((difference_type)n,(pointer)0);
        }

        void deallocate(pointer p,size_type n){
            _deallocate(p);
        }

        void construct(pointer p,const T& value){
            _construct(p,value);
        }

        void destroy(pointer p){
            _destroy(p);
        }

        pointer address(reference x){
            return (pointer)&x;
        }

        const_pointer const_address(const_reference x){
            return (const_pointer)&x;
        }

        size_type max_size()const{
            return size_type(UINT_MAX/sizeof (T));
        }
    };
}

template <class Value>
struct __hashtable_node{
    __hashtable_node* next;
    Value val;
};
/*
 * Key:         节点的实值类型
 * Value:       节点的键值类型
 * HashFun:     hash function的函数型别
 * ExtractKey:  从节点中提取键值的方法 (函数或者仿函数)
 * EqualKey:    判断键值是否相同 (函数或者仿函数)
 * Alloc:       空间配置器 缺省使用 std::alloc
 */

template <class Value,class Key,class HashFcn,class ExtractKey,class EqualKey,class Alloc>
class hashtable{
public:
    typedef Key key_type;
    typedef Value value_type;
    typedef HashFcn hasher;    //为template型别参数重新定义一个名称
    typedef EqualKey key_equal;//为template型别参数重新定义一个名称
    typedef std::size_t size_type;
    typedef ptrdiff_t difference_type;

private:
    //以下三者都是function objects
    //<stl_hash_fun.h> 定义有数个标准型别(如 int、c-style、string等)的hasher
    hasher hash;        //散列函数
    key_equal equals;   //判断键值是否相等
    ExtractKey get_key; //从节点取出键值
    typedef __hashtable_node<Value>node;
    //专属的节点配置器
    typedef simple_alloc<node,Alloc>node_allocator;

    //节点的配置函数
    node* new_node(const value_type& obj){
        node* n = node_allocator::allocate();
        n->next = 0;
        __STL_TRY{
            Chy::allocator<Key>::construct(&n->val,obj);
            return n;
        };
        __STL_UNWIND(node_allocator::deallocate(n);)
    }
    //节点释放函数
    void delete_node(node* n){
        Chy::allocator<Key>::destroy(n->val);
        node_allocator::deallocate(n);
    }

public:

    std::vector<node*,Alloc>buckets;//以vector完成桶的集合，其实值是一个node*
    size_type num_elements;  //node的个数
public:
    //bucket个数 即buckets vector的大小
    size_type bucket_count() const{
        return buckets.size();
    }

    //注意假设 假设long至少有32bit
    static const int __stl_num_primes = 28;
    constexpr static const unsigned long __stl_prime_list[__stl_num_primes] =
    {
        53,         97,         193,       389,       769,
        1543,       3079,       6151,      12289,     24593,
        49157,      98317,      196613,    393241,    786433,
        1572869,    3145739,    6291469,   12582917,  25165843,
        50331653,   100663319,  201326611, 402653189, 805306457,
        1610612741, 3221225473, 4294967291
    };
    //找出上述28指数中，最接近并大于n的那个质数
    inline unsigned long __stl_next_prime(unsigned long n){
        const unsigned long *first = __stl_prime_list;
        const unsigned long *last = __stl_prime_list + __stl_num_primes;
        const unsigned long *pos = std::lower_bound(first,last,n);
        //使用lower_bound() 需要先进行排序
        return pos == last ? *(last-1) : *pos;
    }
    //总共有多少个buckets。以下是hash_table的一个member function
    size_type max_bucket_count()const{
        //其数值将为 4294967291
        return __stl_prime_list[__stl_num_primes - 1];
    }

    //构造函数
    hashtable(size_type n,const HashFcn& hf,const EqualKey& eql)
    :hash(hf),equals(eql),get_key(ExtractKey()),num_elements(0){
        initialize_buckets(n);
    }

    //初始化函数
    void initialize_buckets(size_type n){
        //例子：传入50 返回53
        //然后保留53个元素的空间 然后将其全部填充为0
        const size_type n_buckets = next_size(n);
        buckets.reserve(n_buckets);
        //设定所有的buckets的初值为0(node*)
        buckets.insert(buckets.begin(),n_buckets,(node*)0);
    }

public:
    //版本1：接受实值（value）和buckets个数
    size_type bkt_num(const value_type& obj, size_t n) const
    {
        return bkt_num_key(get_key(obj), n); //调用版本4
    }

//版本2：只接受实值（value）
    size_type bkt_num(const value_type& obj) const
    {
        return bkt_num_key(get_key(obj)); //调用版本3
    }

//版本3，只接受键值
    size_type bkt_num_key(const key_type& key) const
    {
        return bkt_num_key(key, buckets.size()); //调用版本4
    }

//版本4：接受键值和buckets个数
    size_type bkt_num_key(const key_type& key, size_t n) const
    {
        return hash(key) % n; //SGI的所有内建的hash()，在后面的hash functions中介绍
    }

public:
    //相关对应的函数
    //next_size()返回最接近n并大于n的质数
    size_type next_size(size_type n) const {
        return __stl_next_prime(n);
    }

    typedef hashtable<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc>iterator;
    //插入操作和表格重整
    //插入元素不允许重复
    std::pair<iterator,bool>insert_unique(const value_type& obj){
        //判断是否需要重建表格  如果需要就进行扩充
        resize(num_elements + 1);
        return insert_unique_noresize(obj);
    }

    //函数判断是否需要重建表格 如果不需要立刻返回，如果需要 就重建表格
    void resize(size_type num_elements_hint){
        //表格重建与否的原则是：元素的个数(新增元素计入之后)和先前分配的bucket vector进行比较
        //如果前者的大于后者 就需要表格的重建
        //因此 bucket(list)的最大容量和buckets vector的大小相同
        const size_type old_n = buckets.size();
        if (old_n < num_elements_hint){
            //需要重新分配内存
            //计算下一个质数
            const size_type n = next_size(num_elements_hint);
            if (n > old_n){
                std::vector<node*,Alloc>tmp(n,(node*)0);
                __STL_TRY{
                    //处理每一个旧的bucket
                    for (size_type bucket=0;bucket<old_n;bucket++) {
                        //指向节点所对应的的串行的起始节点
                        node* first = buckets[bucket];
                        //处理每一个旧的bucket所含(串行)的每一个节点
                        while(first){
                            //串行节点还未结束
                            //找出节点落在哪一个新的bucket内部
                            size_type new_bucket = bkt_num(first->val,n);
                            //以下四个操作颇为巧妙
                            //(1)令旧bucket指向其所对应的串行的下一个节点(以便迭代处理)
                            buckets[bucket] = first->next;
                            //(2)(3)将当前节点插入到新的bucket内部，成为其对应串行的第一个节点
                            first->next = tmp[new_bucket];
                            tmp[new_bucket] = first;
                            //(4)回到旧的bucket所指向的待处理的串行，准备处理下一个节点
                            first = buckets[bucket];
                        }
                    }
                    //对调新旧两个buckets
                    //离开的时候会释放tmp的内存
                    buckets.swap(tmp);
                };
            }
        }
    }

    //在不需要重建表格的情况下插入新的节点 键值不允许重复
    std::pair<iterator,bool>insert_unique_noresize(const value_type& obj){
        const size_type n = bkt_num(obj) ;//决定obj应该位于 第n n bucket
        node* first = buckets[n]; //令first指向bucket对应的串行头部

        //如果Buckets[n]已经被占用 此时first不再是0 于是进入以下循环
        //走过bucket所对应的整个链表
        for (node* cur = first;cur;cur = cur->next) {
            if (equals(get_key(cur->val)),get_key(obj)){
                //如果发现和链表中的某个键值是相同的 就不插入 立刻返回
                return std::pair<iterator,bool>(iterator(cur, this), false);
            }
            //离开上述循环(或者根本没有进入循环的时候)first指向bucket的所指链表的头部节点
            node* tmp = new_node(obj); //产生新的节点
            tmp->next = first;
            buckets[n] = tmp; //令新的节点成为链表的第一个节点
            ++num_elements;   //节点的个数累加
            return std::pair<iterator,bool>(iterator(tmp,this),true);
        }

    }

    //客户端执行的是另外一种节点的插入行为(不再是insert_unique 而是insert_equal)
    //插入元素 允许重复
    iterator insert_equal(const value_type& obj){
        //判断是否需要重建表格 如果需要就进行扩充
        resize(num_elements+1);
        return insert_equal_noresize(obj);
    }

    //在不需要重建表格的情况下 插入新的节点，键值是允许重复的
    iterator insert_equal_noresize(const value_type& obj){
        const size_type n = bkt_num(obj); //决定obj应该位于第 n bucket
        node* first = buckets[n];//令first指向的bucket对应的链表的头部
        //如果bucket[n]已经被占用，此时的first不为0，进入循环
        //遍历整个链表
        for(node* cur = first;cur;cur = cur->next){
            if (equals(get_key(cur->val),get_key(obj))){
                //如果发现与链表中的某个键值相同，就马上插入，然后返回
                node* tmp = new_node(obj);  //产生新的节点
                tmp->next = cur->next;//新节点插入目前的位置
                cur->next = tmp;
                ++num_elements;
                return iterator (tmp, this); //返回一个迭代器 指向新增的节点
            }
            //进行到这个时候 表示没有发现重复的数值
            node* tmp = new_node(obj);
            tmp->next = first;
            buckets[n] = tmp;
            ++num_elements;
            return iterator(tmp, this);
        }
    }

    void clear(){
        //针对每一个bucket
        for(size_type i = 0;i < buckets.size();++i){
            node * cur = buckets[i];
            //删除bucket list中的每一个节点
            while(cur != 0){
                node* next = cur->next;
                delete_node(cur);
                cur = next;
            }
            buckets[i] = 0; //令buckets内容为null
        }
        num_elements = 0; //令总的节点的个数为0
        //需要注意 buckets vector并没有释放空间，仍然保存先前的大小
    }

    void copy_from(const hashtable& ht){
        //先清除己方的buckets vector，此操作是调用vector::clear() 造成所有的元素都为0
        buckets.clear();
        //为己方的buckets vector保留空间，使与对方相同
        //如果己方的空间大于对方 就不需要改变；如果己方的空间小于对方 就会增大
        buckets.reserve(ht.buckets.size());
        //从己方的buckets vector尾端开始，插入n个元素，其数值为 null 指针
        //注意此时buckets vector为空，所谓的尾端就是起头处
        buckets.insert(buckets.end(),ht.buckets.size(),(node*)0);
        __STL_TRY{
            //针对buckets vector
            for (size_type i = 0;i<ht.buckets.size();++i) {
                //复制vector的每一个元素(是一个指针，指向hashtable节点)
                if (const node* cur = ht.buckets[i]){
                    node* copy = new_node(cur->val);
                    buckets[i] = copy;
                    //针对同一个 buckets list 复制每一个节点
                    for (node* next = cur->next;next ; cur = next,next = cur->next) {
                        copy->next = new_node(next->val);
                        copy = copy->next;
                    }
                }
            }
            //重新登录的节点的个数(hashtable的大小)
            num_elements = ht.num_elements;
        };
        __STL_UNWIND(clear());
    }

};

template <class Value,class Key,class HashFcn,class ExtractKey,class EqualKey,class Alloc>
struct __hashtable_iterator{
    typedef hashtable<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc>hashtable;
    typedef __hashtable_iterator<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc>iterator;
//    typedef __hash_const   静态迭代器
    typedef __hashtable_node<Value>node;
    typedef std::forward_iterator_tag iterator_category;
    typedef Value value_type;
    typedef ptrdiff_t difference_type;
    typedef std::size_t size_type;
    typedef Value& reference;
    typedef Value* pointer;

    node* cur;// 迭代器目前所指的节点
    hashtable* ht;//保持对容器的连接关系 (因为可能需要从bucket跳到bucket)
    __hashtable_iterator(node*n,hashtable* tab):cur(n),ht(tab){}
    __hashtable_iterator(){}
    reference operator*() const {return cur->val;}
    pointer operator->() const {return &(operator*());}
    iterator& operator++();
    iterator operator++(int);
    bool operator==(const iterator& it)const {return cur == it.cur;}
    bool operator!=(const iterator& it)const {return cur != it.cur;}
};




template <class V,class K,class HF,class ExK,class EqK,class A>
__hashtable_iterator<V,K,HF,ExK,EqK,A>&
__hashtable_iterator<V,K,HF,ExK,EqK,A>::operator++() {
    const node* old = cur;
    cur = cur->next; //如果存在 就是他，否则进入以下的if流程
    if (!cur){
        //根据元素的数值，定位出下一个bucket，其起头处就是我们的目的地
        size_type bucket = ht->bkt_num(old->val);
        while(!cur && ++bucket < ht->buckets.size()){
            cur = ht->buckets[bucket];
        }
    }
    return *this;
}

template <class V,class K,class HF,class ExK,class EqK,class A>
__hashtable_iterator<V,K,HF,ExK,EqK,A>
__hashtable_iterator<V,K,HF,ExK,EqK,A>::operator++(int) {
    iterator tmp = *this;
    ++this; //调用operator++
    return tmp;
}

问题

hashtable不能直接被引用，属于内置类型。不被外部使用
客户端可以使用<hash_set.h> 和 <hash_map.h>

当超过了buckets vector就进行表格的重建

    //元素查找
    iterator find(const key_type& key){
        size_type n = bkt_num(key); //首先寻找落在哪一个bucket里面
        node* first;
        //以下 从bucket list的头部开始，逐一比对每个元素的数值，比对成功就退出
        for (first = buckets[n];first && !equals(get_key(first->val),key);first = first->next) {}
        return iterator (first,this);
    }

    //元素计数
    size_type count (const key_type& key)const{
        const size_type n = bkt_num_key(key);//首先寻找落在哪一个bucket里面
        size_type result = 0;
        //遍历bucket list,从头部开始，逐一比对每个元素的数值。比对成功就累加1
        for(const node* cur = buckets[n];cur;cur = cur->next){
            if (equals(get_key(cur->val),key)){
                ++result;
            }
        }
        return result;
    }

hash_functions

仿函数
bkt_num() 调用此处的hash function得到一个可以对hashtable进行模运算的数值
如果是char int long等整数型别，什么都不做；如果是字符串类型的比如const char* 就需要设计一个转换函数

上述代码表明 SGI hashtable无法处理上述各项型别之外的元素，比如string double float，如果想要处理这些型别是需要自行定义hash function的

#include <iostream>
#include <vector>

#ifdef __STL_USE_EXCEPTIONS
#define __STL_TRY   try
#define __STL_UNWIND(action)   catch(...) { action; throw; }
#else
#define __STL_TRY
#define __STL_UNWIND(action)
#endif

template<class T,class Alloc>
class simple_alloc{
public:
    static T* allocate(std::size_t n){
        return 0==n?0:(T*)Alloc::allocate(n * sizeof(T));
    }
    static T* allocate(void){
        return (T*)Alloc::allocate(sizeof (T));
    }

    static void deallocate(T* p,size_t n){
        if (n!=0){
            Alloc::deallocate(p,n * sizeof(T));
        }
    }
    static void deallocate(T* p){
        Alloc::deallocate(p,sizeof(T));
    }
};

namespace Chy{
    template <class T>
    inline T* _allocate(ptrdiff_t size,T*){
        std::set_new_handler(0);
        T* tmp = (T*)(::operator new((std::size_t)(size * sizeof (T))));
        if (tmp == 0){
            std::cerr << "out of memory" << std::endl;
            exit(1);
        }
        return tmp;
    }

    template<class T>
    inline void _deallocate(T* buffer){
        ::operator delete (buffer);
    }

    template<class T1,class T2>
    inline void _construct(T1 *p,const T2& value){
        new(p) T1 (value);  //没看懂
    }

    template <class T>
    inline void _destroy(T* ptr){
        ptr->~T();
    }

    template <class T>
    class allocator{
    public:
        typedef T           value_type;
        typedef T*          pointer;
        typedef const T*    const_pointer;
        typedef T&          reference;
        typedef const T&    const_reference;
        typedef std::size_t size_type;
        typedef ptrdiff_t   difference_type;

        template<class U>
        struct rebind{
            typedef allocator<U>other;
        };

        pointer allocate(size_type n,const void * hint = 0){
            return _allocate((difference_type)n,(pointer)0);
        }

        void deallocate(pointer p,size_type n){
            _deallocate(p);
        }

        void construct(pointer p,const T& value){
            _construct(p,value);
        }

        void destroy(pointer p){
            _destroy(p);
        }

        pointer address(reference x){
            return (pointer)&x;
        }

        const_pointer const_address(const_reference x){
            return (const_pointer)&x;
        }

        size_type max_size()const{
            return size_type(UINT_MAX/sizeof (T));
        }
    };
}

template <class Value>
struct __hashtable_node{
    __hashtable_node* next;
    Value val;
};
/*
 * Key:         节点的实值类型
 * Value:       节点的键值类型
 * HashFun:     hash function的函数型别
 * ExtractKey:  从节点中提取键值的方法 (函数或者仿函数)
 * EqualKey:    判断键值是否相同 (函数或者仿函数)
 * Alloc:       空间配置器 缺省使用 std::alloc
 */

template <class Value,class Key,class HashFcn,class ExtractKey,class EqualKey,class Alloc>
class hashtable{
public:
    typedef Key key_type;
    typedef Value value_type;
    typedef HashFcn hasher;    //为template型别参数重新定义一个名称
    typedef EqualKey key_equal;//为template型别参数重新定义一个名称
    typedef std::size_t size_type;
    typedef ptrdiff_t difference_type;

private:
    //以下三者都是function objects
    //<stl_hash_fun.h> 定义有数个标准型别(如 int、c-style、string等)的hasher
    hasher hash;        //散列函数
    key_equal equals;   //判断键值是否相等
    ExtractKey get_key; //从节点取出键值
    typedef __hashtable_node<Value>node;
    //专属的节点配置器
    typedef simple_alloc<node,Alloc>node_allocator;

    //节点的配置函数
    node* new_node(const value_type& obj){
        node* n = node_allocator::allocate();
        n->next = 0;
        __STL_TRY{
            Chy::allocator<Key>::construct(&n->val,obj);
            return n;
        };
        __STL_UNWIND(node_allocator::deallocate(n);)
    }
    //节点释放函数
    void delete_node(node* n){
        Chy::allocator<Key>::destroy(n->val);
        node_allocator::deallocate(n);
    }

public:

    std::vector<node*,Alloc>buckets;//以vector完成桶的集合，其实值是一个node*
    size_type num_elements;  //node的个数
public:
    //bucket个数 即buckets vector的大小
    size_type bucket_count() const{
        return buckets.size();
    }

    //注意假设 假设long至少有32bit
    static const int __stl_num_primes = 28;
    constexpr static const unsigned long __stl_prime_list[__stl_num_primes] =
    {
        53,         97,         193,       389,       769,
        1543,       3079,       6151,      12289,     24593,
        49157,      98317,      196613,    393241,    786433,
        1572869,    3145739,    6291469,   12582917,  25165843,
        50331653,   100663319,  201326611, 402653189, 805306457,
        1610612741, 3221225473, 4294967291
    };
    //找出上述28指数中，最接近并大于n的那个质数
    inline unsigned long __stl_next_prime(unsigned long n){
        const unsigned long *first = __stl_prime_list;
        const unsigned long *last = __stl_prime_list + __stl_num_primes;
        const unsigned long *pos = std::lower_bound(first,last,n);
        //使用lower_bound() 需要先进行排序
        return pos == last ? *(last-1) : *pos;
    }
    //总共有多少个buckets。以下是hash_table的一个member function
    size_type max_bucket_count()const{
        //其数值将为 4294967291
        return __stl_prime_list[__stl_num_primes - 1];
    }

    //构造函数
    hashtable(size_type n,const HashFcn& hf,const EqualKey& eql)
    :hash(hf),equals(eql),get_key(ExtractKey()),num_elements(0){
        initialize_buckets(n);
    }

    //初始化函数
    void initialize_buckets(size_type n){
        //例子：传入50 返回53
        //然后保留53个元素的空间 然后将其全部填充为0
        const size_type n_buckets = next_size(n);
        buckets.reserve(n_buckets);
        //设定所有的buckets的初值为0(node*)
        buckets.insert(buckets.begin(),n_buckets,(node*)0);
    }

public:
    //版本1：接受实值（value）和buckets个数
    size_type bkt_num(const value_type& obj, size_t n) const
    {
        return bkt_num_key(get_key(obj), n); //调用版本4
    }

//版本2：只接受实值（value）
    size_type bkt_num(const value_type& obj) const
    {
        return bkt_num_key(get_key(obj)); //调用版本3
    }

//版本3，只接受键值
    size_type bkt_num_key(const key_type& key) const
    {
        return bkt_num_key(key, buckets.size()); //调用版本4
    }

//版本4：接受键值和buckets个数
    size_type bkt_num_key(const key_type& key, size_t n) const
    {
        return hash(key) % n; //SGI的所有内建的hash()，在后面的hash functions中介绍
    }

public:
    //相关对应的函数
    //next_size()返回最接近n并大于n的质数
    size_type next_size(size_type n) const {
        return __stl_next_prime(n);
    }

    typedef hashtable<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc>iterator;
    //插入操作和表格重整
    //插入元素不允许重复
    std::pair<iterator,bool>insert_unique(const value_type& obj){
        //判断是否需要重建表格  如果需要就进行扩充
        resize(num_elements + 1);
        return insert_unique_noresize(obj);
    }

    //函数判断是否需要重建表格 如果不需要立刻返回，如果需要 就重建表格
    void resize(size_type num_elements_hint){
        //表格重建与否的原则是：元素的个数(新增元素计入之后)和先前分配的bucket vector进行比较
        //如果前者的大于后者 就需要表格的重建
        //因此 bucket(list)的最大容量和buckets vector的大小相同
        const size_type old_n = buckets.size();
        if (old_n < num_elements_hint){
            //需要重新分配内存
            //计算下一个质数
            const size_type n = next_size(num_elements_hint);
            if (n > old_n){
                std::vector<node*,Alloc>tmp(n,(node*)0);
                __STL_TRY{
                    //处理每一个旧的bucket
                    for (size_type bucket=0;bucket<old_n;bucket++) {
                        //指向节点所对应的的串行的起始节点
                        node* first = buckets[bucket];
                        //处理每一个旧的bucket所含(串行)的每一个节点
                        while(first){
                            //串行节点还未结束
                            //找出节点落在哪一个新的bucket内部
                            size_type new_bucket = bkt_num(first->val,n);
                            //以下四个操作颇为巧妙
                            //(1)令旧bucket指向其所对应的串行的下一个节点(以便迭代处理)
                            buckets[bucket] = first->next;
                            //(2)(3)将当前节点插入到新的bucket内部，成为其对应串行的第一个节点
                            first->next = tmp[new_bucket];
                            tmp[new_bucket] = first;
                            //(4)回到旧的bucket所指向的待处理的串行，准备处理下一个节点
                            first = buckets[bucket];
                        }
                    }
                    //对调新旧两个buckets
                    //离开的时候会释放tmp的内存
                    buckets.swap(tmp);
                };
            }
        }
    }

    //在不需要重建表格的情况下插入新的节点 键值不允许重复
    std::pair<iterator,bool>insert_unique_noresize(const value_type& obj){
        const size_type n = bkt_num(obj) ;//决定obj应该位于 第n n bucket
        node* first = buckets[n]; //令first指向bucket对应的串行头部

        //如果Buckets[n]已经被占用 此时first不再是0 于是进入以下循环
        //走过bucket所对应的整个链表
        for (node* cur = first;cur;cur = cur->next) {
            if (equals(get_key(cur->val)),get_key(obj)){
                //如果发现和链表中的某个键值是相同的 就不插入 立刻返回
                return std::pair<iterator,bool>(iterator(cur, this), false);
            }
            //离开上述循环(或者根本没有进入循环的时候)first指向bucket的所指链表的头部节点
            node* tmp = new_node(obj); //产生新的节点
            tmp->next = first;
            buckets[n] = tmp; //令新的节点成为链表的第一个节点
            ++num_elements;   //节点的个数累加
            return std::pair<iterator,bool>(iterator(tmp,this),true);
        }

    }

    //客户端执行的是另外一种节点的插入行为(不再是insert_unique 而是insert_equal)
    //插入元素 允许重复
    iterator insert_equal(const value_type& obj){
        //判断是否需要重建表格 如果需要就进行扩充
        resize(num_elements+1);
        return insert_equal_noresize(obj);
    }

    //在不需要重建表格的情况下 插入新的节点，键值是允许重复的
    iterator insert_equal_noresize(const value_type& obj){
        const size_type n = bkt_num(obj); //决定obj应该位于第 n bucket
        node* first = buckets[n];//令first指向的bucket对应的链表的头部
        //如果bucket[n]已经被占用，此时的first不为0，进入循环
        //遍历整个链表
        for(node* cur = first;cur;cur = cur->next){
            if (equals(get_key(cur->val),get_key(obj))){
                //如果发现与链表中的某个键值相同，就马上插入，然后返回
                node* tmp = new_node(obj);  //产生新的节点
                tmp->next = cur->next;//新节点插入目前的位置
                cur->next = tmp;
                ++num_elements;
                return iterator (tmp, this); //返回一个迭代器 指向新增的节点
            }
            //进行到这个时候 表示没有发现重复的数值
            node* tmp = new_node(obj);
            tmp->next = first;
            buckets[n] = tmp;
            ++num_elements;
            return iterator(tmp, this);
        }
    }

    void clear(){
        //针对每一个bucket
        for(size_type i = 0;i < buckets.size();++i){
            node * cur = buckets[i];
            //删除bucket list中的每一个节点
            while(cur != 0){
                node* next = cur->next;
                delete_node(cur);
                cur = next;
            }
            buckets[i] = 0; //令buckets内容为null
        }
        num_elements = 0; //令总的节点的个数为0
        //需要注意 buckets vector并没有释放空间，仍然保存先前的大小
    }

    void copy_from(const hashtable& ht){
        //先清除己方的buckets vector，此操作是调用vector::clear() 造成所有的元素都为0
        buckets.clear();
        //为己方的buckets vector保留空间，使与对方相同
        //如果己方的空间大于对方 就不需要改变；如果己方的空间小于对方 就会增大
        buckets.reserve(ht.buckets.size());
        //从己方的buckets vector尾端开始，插入n个元素，其数值为 null 指针
        //注意此时buckets vector为空，所谓的尾端就是起头处
        buckets.insert(buckets.end(),ht.buckets.size(),(node*)0);
        __STL_TRY{
            //针对buckets vector
            for (size_type i = 0;i<ht.buckets.size();++i) {
                //复制vector的每一个元素(是一个指针，指向hashtable节点)
                if (const node* cur = ht.buckets[i]){
                    node* copy = new_node(cur->val);
                    buckets[i] = copy;
                    //针对同一个 buckets list 复制每一个节点
                    for (node* next = cur->next;next ; cur = next,next = cur->next) {
                        copy->next = new_node(next->val);
                        copy = copy->next;
                    }
                }
            }
            //重新登录的节点的个数(hashtable的大小)
            num_elements = ht.num_elements;
        };
        __STL_UNWIND(clear());
    }

    //元素查找
    iterator find(const key_type& key){
        size_type n = bkt_num(key); //首先寻找落在哪一个bucket里面
        node* first;
        //以下 从bucket list的头部开始，逐一比对每个元素的数值，比对成功就退出
        for (first = buckets[n];first && !equals(get_key(first->val),key);first = first->next) {}
        return iterator (first,this);
    }

    //元素计数
    size_type count (const key_type& key)const{
        const size_type n = bkt_num_key(key);//首先寻找落在哪一个bucket里面
        size_type result = 0;
        //遍历bucket list,从头部开始，逐一比对每个元素的数值。比对成功就累加1
        for(const node* cur = buckets[n];cur;cur = cur->next){
            if (equals(get_key(cur->val),key)){
                ++result;
            }
        }
        return result;
    }

};

template <class Value,class Key,class HashFcn,class ExtractKey,class EqualKey,class Alloc>
struct __hashtable_iterator{
    typedef hashtable<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc>hashtable;
    typedef __hashtable_iterator<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc>iterator;
//    typedef __hash_const   静态迭代器
    typedef __hashtable_node<Value>node;
    typedef std::forward_iterator_tag iterator_category;
    typedef Value value_type;
    typedef ptrdiff_t difference_type;
    typedef std::size_t size_type;
    typedef Value& reference;
    typedef Value* pointer;

    node* cur;// 迭代器目前所指的节点
    hashtable* ht;//保持对容器的连接关系 (因为可能需要从bucket跳到bucket)
    __hashtable_iterator(node*n,hashtable* tab):cur(n),ht(tab){}
    __hashtable_iterator(){}
    reference operator*() const {return cur->val;}
    pointer operator->() const {return &(operator*());}
    iterator& operator++();
    iterator operator++(int);
    bool operator==(const iterator& it)const {return cur == it.cur;}
    bool operator!=(const iterator& it)const {return cur != it.cur;}
};




template <class V,class K,class HF,class ExK,class EqK,class A>
__hashtable_iterator<V,K,HF,ExK,EqK,A>&
__hashtable_iterator<V,K,HF,ExK,EqK,A>::operator++() {
    const node* old = cur;
    cur = cur->next; //如果存在 就是他，否则进入以下的if流程
    if (!cur){
        //根据元素的数值，定位出下一个bucket，其起头处就是我们的目的地
        size_type bucket = ht->bkt_num(old->val);
        while(!cur && ++bucket < ht->buckets.size()){
            cur = ht->buckets[bucket];
        }
    }
    return *this;
}

template <class V,class K,class HF,class ExK,class EqK,class A>
__hashtable_iterator<V,K,HF,ExK,EqK,A>
__hashtable_iterator<V,K,HF,ExK,EqK,A>::operator++(int) {
    iterator tmp = *this;
    ++this; //调用operator++
    return tmp;
}

template <class Key> struct hash{};

inline size_t __stl_hash_string(const char* s){
    unsigned long h = 0;
    for(;*s;++s){
        h = 5*h + *s;
    }
    return std::size_t (h);
}


//下面所有的 __STL_TEMPLATE_NULL 在<stl_config.h>里面全部被定义为template<>


int main(){
    const char *input_string("Hello");
    std::cout << input_string << std::endl;
    std::cout << __stl_hash_string(input_string) << std::endl;
}

参考链接

关联容器 — hashtable · STL源码分析 · 看云

MY CUP OF TEA

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
STL源码剖析 hashtable

二叉搜索树具有对数平均时间的表现，但是这个需要满足的假设前提是输入的数据需要具备随机性 hashtable 散列表这种结构在插入、删除、搜寻等操作层面上也具有常数平均时间的表现。而且不需要依赖元素的随机性，这种表现是以统计为基础的hashtable的概述hashtable可提供对任何有名项的存取和删除操作因为操作的对象是有名项，因此hashtable可以作为一种字典结构将一个元素映射成为一个 “大小可以接受的索引”简称为hash function散列函数考虑到元素的个数大于array的容
复制链接

扫一扫

专栏目录