C++进阶——STL源码之hashtable

最新推荐文章于 2024-02-21 10:30:59 发布

&动感超人

最新推荐文章于 2024-02-21 10:30:59 发布

阅读量1.1k

点赞数 1

分类专栏： C/C++ STL

本文链接：https://blog.csdn.net/qq_25065595/article/details/108413195

版权

C/C++ 同时被 2 个专栏收录

44 篇文章 7 订阅

订阅专栏

STL

14 篇文章 4 订阅

订阅专栏

STL源码之hashtable

hashtable 是一种在插入、删除、搜寻等操作上也具有 “常数平均时间”（散列表）的数据结构，而且这种表现是以统计为基础，不需依赖输入元素的随机性。

STL中hashtable是实现hash_map和hash_set的底层。它解决冲突的方式是开链法，每个放置索引值的节点称为桶节点（也就是该索引值的头结点），桶节点里放着一个value值，一个指向下一个节点的next指针。

维护了一个vector<node *> buckets存放所有桶节点，还维护一张质数表，里面质数为53,97,193....这个vector的容量大小是按离初始值最近的那个质数分配的，举个例子：hashtable对象构造时为50个桶的话，会自动选择比50大的最近质数53来作为vecotr成员函数reserve的大小。

当hashtable中插入的元素个数大于当前vector的容量时，会新建一个vector，新vector的容量是之前vector容量所在质数表里的下一个质数，如53的后一个是97，新vector的容量为97。之后把原来vector的数据放入新的vector中，最后用swap交换两个vector，这时对调两方如果大小不同，大的会变小，小的会变大，在函数结束后会自动释放新键的临时vector（注意swap的效果）。

冲突处理

1. 线性探测

线性探测就是在该位置的空间不在可用时，就循序往下一一寻找，直到找到一个可用的空间为止（H +1，H+2 ， H+3...）。

2. 二次探测

在线性探测中，会出现平均插入成本的成长幅度，远高于负载系数（元素个数除以表格的大小）的成长幅度，为了解决这个问题使用二次探测，所谓二次探测就是在出现位置冲突时，依次尝试（H+1^2，H+2^2,H+3^2...）。

为了确保二次探测能够插入成功：假设表格的大小为质数，而且永远保持负载系数在0.5以下，那么就可以确定每插入一个新元素所需要的探测次数不多于2。

二次探测可以消除一次探测的弊端，但是它又会带来另一个问题：两个元素经hash function 计算出相同的位置，则插入时所探测的位置也相同，造成浪费。

3. 开链法

这种做法是在每一个表格中维护一个list；hash function 为我们分配某一个list，然后在list上执行元素的插入、搜寻、删除等操作；

STL的 hashtable 便采用这个做法。

hashtable的结点结构

hashtable的结点信息包含一个Value型值和一个Node指针。

  struct _Hash_node_base
  {
    _Hash_node_base* _M_nxt;

    _Hash_node_base() noexcept : _M_nxt() { }

    _Hash_node_base(_Hash_node_base* __next) noexcept : _M_nxt(__next) { }
  };

  /**
   *  struct _Hash_node_value_base
   *
   *  Node type with the value to store.
   */
  template<typename _Value>
    struct _Hash_node_value_base : _Hash_node_base
    {
      typedef _Value value_type;

      __gnu_cxx::__aligned_buffer<_Value> _M_storage;

      _Value*
      _M_valptr() noexcept
      { return _M_storage._M_ptr(); }

      const _Value*
      _M_valptr() const noexcept
      { return _M_storage._M_ptr(); }

      _Value&
      _M_v() noexcept
      { return *_M_valptr(); }

      const _Value&
      _M_v() const noexcept
      { return *_M_valptr(); }
    };

  /**
   *  Primary template struct _Hash_node.
   */
  template<typename _Value, bool _Cache_hash_code>
    struct _Hash_node;

  /**
   *  Specialization for nodes with caches, struct _Hash_node.
   *
   *  Base class is __detail::_Hash_node_value_base.
   */
  template<typename _Value>
    struct _Hash_node<_Value, true> : _Hash_node_value_base<_Value>
    {
      std::size_t  _M_hash_code;

      _Hash_node*
      _M_next() const noexcept
      { return static_cast<_Hash_node*>(this->_M_nxt); }
    };

hashtable的迭代器

hashtable的迭代器必须维系整个“bucket vector”的关系，并记录目前所指的结点；hashtable的迭代器没有定义后退的操作，也没有逆向迭代器：

  /// Base class for node iterators.
  template<typename _Value, bool _Cache_hash_code>
    struct _Node_iterator_base
    {
      using __node_type = _Hash_node<_Value, _Cache_hash_code>;

      __node_type*  _M_cur;

      _Node_iterator_base(__node_type* __p) noexcept
      : _M_cur(__p) { }

      void
      _M_incr() noexcept
      { _M_cur = _M_cur->_M_next(); }
    };

  template<typename _Value, bool _Cache_hash_code>
    inline bool
    operator==(const _Node_iterator_base<_Value, _Cache_hash_code>& __x,
	       const _Node_iterator_base<_Value, _Cache_hash_code >& __y)
    noexcept
    { return __x._M_cur == __y._M_cur; }

  template<typename _Value, bool _Cache_hash_code>
    inline bool
    operator!=(const _Node_iterator_base<_Value, _Cache_hash_code>& __x,
	       const _Node_iterator_base<_Value, _Cache_hash_code>& __y)
    noexcept
    { return __x._M_cur != __y._M_cur; }

  /// Node iterators, used to iterate through all the hashtable.
  template<typename _Value, bool __constant_iterators, bool __cache>
    struct _Node_iterator
    : public _Node_iterator_base<_Value, __cache>
    {
    private:
      using __base_type = _Node_iterator_base<_Value, __cache>;
      using __node_type = typename __base_type::__node_type;

    public:
      typedef _Value					value_type;
      typedef std::ptrdiff_t				difference_type;
      typedef std::forward_iterator_tag			iterator_category;

      using pointer = typename std::conditional<__constant_iterators,
						const _Value*, _Value*>::type;

      using reference = typename std::conditional<__constant_iterators,
						  const _Value&, _Value&>::type;

      _Node_iterator() noexcept
      : __base_type(0) { }

      explicit
      _Node_iterator(__node_type* __p) noexcept
      : __base_type(__p) { }

      reference
      operator*() const noexcept
      { return this->_M_cur->_M_v(); }

      pointer
      operator->() const noexcept
      { return this->_M_cur->_M_valptr(); }

      _Node_iterator&
      operator++() noexcept
      {
	this->_M_incr();
	return *this;
      }

      _Node_iterator
      operator++(int) noexcept
      {
	_Node_iterator __tmp(*this);
	this->_M_incr();
	return __tmp;
      }
    };

hashtable的成员

hashtable的源码很复杂，这里主要理一下它的实现思想：

1.hashtable的定义

a. 主要成员变量

 using __bucket_type = __node_base*;        

private:
      __bucket_type*		_M_buckets;//桶子数组
      size_type			_M_bucket_count;//数组大小
      __node_base		_M_before_begin;//相当于一个header结点
      size_type			_M_element_count;//value数目
      _RehashPolicy		_M_rehash_policy;//rehash策略

b. 定义

template<typename _Key, typename _Value, typename _Alloc,
   typename _ExtractKey, typename _Equal,
   typename _H1, typename _H2, typename _Hash,
   typename _RehashPolicy, typename _Traits>
class _Hashtable
: public __detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal,
               _H1, _H2, _Hash, _Traits>,
public __detail::_Map_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
               _H1, _H2, _Hash, _RehashPolicy, _Traits>,
public __detail::_Insert<_Key, _Value, _Alloc, _ExtractKey, _Equal,
           _H1, _H2, _Hash, _RehashPolicy, _Traits>,
public __detail::_Rehash_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
               _H1, _H2, _Hash, _RehashPolicy, _Traits>,
public __detail::_Equality<_Key, _Value, _Alloc, _ExtractKey, _Equal,
               _H1, _H2, _Hash, _RehashPolicy, _Traits>,
private __detail::_Hashtable_alloc<
   typename __alloctr_rebind<_Alloc,
   __detail::_Hash_node<_Value,
           _Traits::__hash_cached::value> >::__type>

{ 。。。 }

它的继承关系很复杂，这里不关心这个，来看下hashtable模板参数：

Value：结点的实值型别
Key：结点的键值型别
_ExtractKey:从结点取出键值的方法
_Equal：判断键值相同与否的方法
_Alloc：空间配置器
_RehashPolicy：rehash的策略

2. 一些接口的定义

__node_type*
_M_begin() const
{ return static_cast<__node_type*>(_M_before_begin._M_nxt); }//获取到第一个结点

iterator
begin() noexcept
{ return iterator(_M_begin()); }

const_iterator
begin() const noexcept
{ return const_iterator(_M_begin()); }

iterator
end() noexcept
{ return iterator(nullptr); }//end直接为nullptr

const_iterator
end() const noexcept
{ return const_iterator(nullptr); }

size_type
size() const noexcept
{ return _M_element_count; }//size为value的个数

size_type
bucket_count() const noexcept
{ return _M_bucket_count; }//桶子大小

size_type
bucket(const key_type& __k) const
{ return _M_bucket_index(__k, this->_M_hash_code(__k)); }//根据key获取桶子的索引

size_type
_M_bucket_index(const key_type& __k, __hash_code __c) const
{ return __hash_code_base::_M_bucket_index(__k, __c, _M_bucket_count); }

std::size_t
_M_bucket_index(const _Key&, __hash_code __c, std::size_t __n) const
{ return _M_h2()(__c, __n); }

_H2&
_M_h2() { return __ebo_h2::_S_get(*this); }//根据hashtable的定义可知道最终使用的是_H2来计算出桶的index

&动感超人

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
C++进阶——STL源码之hashtable

STL源码之hashtablehashtable 是一种在插入、删除、搜寻等操作上也具有 “常数平均时间”（散列表）的数据结构，而且这种表现是以统计为基础，不需依赖输入元素的随机性。STL中hashtable是实现hash_map和hash_set的底层。它解决冲突的方式是开链法，每个放置索引值的节点称为桶节点（也就是该索引值的头结点），桶节点里放着一个value值，一个指向下一个节点的next指针。维护了一个vector<node *> buckets存放所有桶节点，...
复制链接

扫一扫