字典初步认识

最新推荐文章于 2024-06-16 11:30:12 发布

RookieZHL

最新推荐文章于 2024-06-16 11:30:12 发布

阅读量147

点赞数

分类专栏： C++ 文章标签： c++ Powered by 金山文档

本文链接：https://blog.csdn.net/CodeKiller123/article/details/129343538

版权

C++ 专栏收录该内容

27 篇文章 0 订阅

订阅专栏

#线性表描述

当然，首先给出抽象类dictionary，代码如下：

template<class K, class E>
class dictionary 
{
   public:
      virtual ~dictionary() {}
      virtual bool empty() const = 0;
                  // return true iff dictionary is empty
      virtual int size() const = 0;
                  // return number of pairs in dictionary
      virtual pair<const K, E>* find(const K&) const = 0;
                  // return pointer to matching pair
      virtual void erase(const K&) = 0;
                  // remove matching pair
      virtual void insert(const pair<const K, E>&) = 0;
                  // insert a (key, value) pair into the dictionary
};

其中pair的用法参见(1条消息) C++ pair的基本用法总结（整理）_sevencheng798的博客-CSDN博客，这里也是学习别人写好的

接下来就是sortChain的节点类，pairNode，代码如下：

template <class K, class E>
struct pairNode 
{
   typedef pair<const K, E> pairType;
   pairType element;
   pairNode<K,E> *next;

   pairNode(const pairType& thePair):element(thePair){}
   pairNode(const pairType& thePair, pairNode<K,E>* theNext)
            :element(thePair){next = theNext;}
};

再下来就是sortChain类的完整定义

template<class K, class E>
class sortedChain : public dictionary<K,E> 
{
   public:
      sortedChain() {firstNode = NULL; dSize = 0;}
      ~sortedChain();

      bool empty() const {return dSize == 0;}
      int size() const {return dSize;}
      pair<const K, E>* find(const K&) const;
      void erase(const K&);
      void insert(const pair<const K, E>&);
      void output(ostream& out) const;

   protected:
      pairNode<K,E>* firstNode;  // pointer to first node in chain
      int dSize;                 // number of elements in dictionary
};

template<class K, class E>
sortedChain<K,E>::~sortedChain()
{// Destructor.  Delete all nodes.
   while (firstNode != NULL)
   {// delete firstNode
      pairNode<K,E>* nextNode = firstNode->next;
      delete firstNode;
      firstNode = nextNode;
   }
}

template<class K, class E>
pair<const K,E>* sortedChain<K,E>::find(const K& theKey) const
{// Return pointer to matching pair.
 // Return NULL if no matching pair.
   pairNode<K,E>* currentNode = firstNode;

   // search for match with theKey
   while (currentNode != NULL && 
          currentNode->element.first != theKey)
      currentNode = currentNode->next;

   // verify match
   if (currentNode != NULL && currentNode->element.first == theKey)
      // yes, found match
      return &currentNode->element;

   // no match
   return NULL;
}

template<class K, class E>
void sortedChain<K,E>::insert(const pair<const K, E>& thePair)
{// Insert thePair into the dictionary. Overwrite existing
 // pair, if any, with same key.
   pairNode<K,E> *p = firstNode,
                *tp = NULL; // tp trails p

   // move tp so that thePair can be inserted after tp
   while (p != NULL && p->element.first < thePair.first)
   {
      tp = p;
      p = p->next;
   }

   // check if there is a matching pair
   if (p != NULL && p->element.first == thePair.first)
   {// replace old value
      p->element.second = thePair.second;
      return;
   }

   // no match, set up node for thePair
   pairNode<K,E> *newNode = new pairNode<K,E>(thePair, p);

   // insert newNode just after tp
   if (tp == NULL) firstNode = newNode;
   else tp->next = newNode;

   dSize++;
   return;
}

template<class K, class E>
void sortedChain<K,E>::erase(const K& theKey)
{// Delete the pair, if any, whose key equals theKey.
   pairNode<K,E> *p = firstNode,
                *tp = NULL; // tp trails p
   
   // search for match with theKey
   while (p != NULL && p->element.first < theKey)
   {
      tp = p;
      p = p->next;
   }

   // verify match
   if (p != NULL && p->element.first == theKey)
   {// found a match
      // remove p from the chain
      if (tp == NULL) firstNode = p->next;  // p is first node
      else tp->next = p->next;

      delete p;
      dSize--;
   }
}

template<class K, class E>
void sortedChain<K,E>::output(ostream& out) const
{// Insert the chain elements into the stream out.
   for (pairNode<K,E>* currentNode = firstNode;
                       currentNode != NULL;
                       currentNode = currentNode->next)
      out << currentNode->element.first << " "
          << currentNode->element.second << "  ";
}

// overload <<
template <class K, class E>
ostream& operator<<(ostream& out, const sortedChain<K,E>& x)
   {x.output(out); return out;}

简单说一下，find方法是返回匹配的数对的指针，insert方法是往字典中插入thePair，覆盖已经存在的匹配的数对，erase方法是删除关键字为theKey的数对

#散列表描述

字典的另一种方法是散列(hashing)。它用一个散列函数(也称哈希函数)把字典的数对映射到一个散列表（哈希表）的具体位置。如果数对p的关键字是k，散列函数为f，那么在理想情况下，p在散列表中的位置为f(k)。暂时假定散列表的每一个位置对多能够存储一个记录。为了搜索关键字为k的数对，先要计算f(k)，然后查看在散列表的f(k)处是否已有一个数对。如果有，便找到了该数对。如果没有，字典就不包含该数对。在前一种情况下，可以删除该数对，为此只需要使散列表的f(k)位置为空。在后一种情况下，可以把该数对插在f(k)的位置。

关于桶、起始桶、除法散列函数、冲突和溢出、散列函数等的概念，这里不做讲解，请读者自行学习。以下是把字符串转换为整数的代码

int stringToInt(string s)
{// Convert s into a nonnegative int that depends on all
 // characters of s.
   int length = (int) s.length();   // number of characters in s
   int answer = 0;
   if (length % 2 == 1)
   {// length is odd
      answer = s.at(length - 1);
      length--;
   }

   // length is now even
   for (int i = 0; i < length; i += 2)
   {// do two characters at a time
      answer += s.at(i);
      answer += ((int) s.at(i + 1)) << 8;
   }

   return (answer < 0) ? -answer : answer;
}

通过以上代码，逐对儿地把字符转换为一个唯一整数，并累计求和。

而下面的这种方式是专业版的hash<string>

template<>
class hash<string>
{
   public:
      size_t operator()(const string theKey) const
      {// Convert theKey to a nonnegative integer.
         unsigned long hashValue = 0; 
         int length = (int) theKey.length();
         for (int i = 0; i < length; i++)
            hashValue = 5 * hashValue + theKey.at(i);
    
         return size_t(hashValue);
      }
};

以下给出散列表的实现

template<class K, class E>
class hashTable
{
   public:
      hashTable(int theDivisor = 11);
      ~hashTable(){delete [] table;}

      bool empty() const {return dSize == 0;}
      int size() const {return dSize;}
      pair<const K, E>* find(const K&) const;
      void insert(const pair<const K, E>&);
      void output(ostream& out) const;

   protected:
      int search(const K&) const;
      pair<const K, E>** table;  // hash table
      hash<K> hash;              // maps type K to nonnegative integer
      int dSize;                 // number of pairs in dictionary
      int divisor;               // hash function divisor
};

template<class K, class E>
hashTable<K,E>::hashTable(int theDivisor)
{
   divisor = theDivisor;
   dSize = 0;

   // allocate and initialize hash table array
   table = new pair<const K, E>* [divisor];
   for (int i = 0; i < divisor; i++)
      table[i] = NULL;
}

template<class K, class E>
int hashTable<K,E>::search(const K& theKey) const
{// Search an open addressed hash table for a pair with key theKey.
 // Return location of matching pair if found, otherwise return
 // location where a pair with key theKey may be inserted
 // provided the hash table is not full.

   int i = (int) hash(theKey) % divisor;  // home bucket
   int j = i;    // start at home bucket
   do
   {
      if (table[j] == NULL || table[j]->first == theKey)
         return j;
      j = (j + 1) % divisor;  // next bucket
   } while (j != i);          // returned to home bucket?

   return j;  // table full
}

template<class K, class E>
pair<const K,E>* hashTable<K,E>::find(const K& theKey) const
{// Return pointer to matching pair.
 // Return NULL if no matching pair.
   // search the table
   int b = search(theKey);

   // see if a match was found at table[b]
   if (table[b] == NULL || table[b]->first != theKey)
      return NULL;           // no match

   return table[b];  // matching pair
}

template<class K, class E>
void hashTable<K,E>::insert(const pair<const K, E>& thePair)
{// Insert thePair into the dictionary. Overwrite existing
 // pair, if any, with same key.
 // Throw hashTableFull exception in case table is full.
   // search the table for a matching pair
   int b = search(thePair.first);

   // check if matching pair found
   if (table[b] == NULL)
   {
      // no matching pair and table not full
      table[b] = new pair<const K,E> (thePair);
      dSize++;
   }
   else
   {// check if duplicate or table full
      if (table[b]->first == thePair.first)
      {// duplicate, change table[b]->second
         table[b]->second = thePair.second;
      }
      else // table is full
         throw hashTableFull();
   }
}

template<class K, class E>
void hashTable<K,E>::output(ostream& out) const
{// Insert the hash table into the stream out.
   for (int i = 0; i < divisor; i++)
      if (table[i] == NULL)
         cout << "NULL" << endl;
      else
         cout << table[i]->first << " "
              << table[i]->second << endl;
}

// overload <<
template <class K, class E>
ostream& operator<<(ostream& out, const hashTable<K,E>& x)
   {x.output(out); return out;}

解释一下，pair<const K, E>** table的意思是散列表，可以认为创造了很多个一维数组，然后每个一维数组用一个指针存放，然后将很多个指针再用指针进行存放，这样一来，例如table[b]就是一个指针，指向关键字为theKey的数对；hash<K> hash的意思是把类型K映射到一个非整数；dSize表示字典中数对的个数，divisor表示散列函数除数。

#链式散列

链式散列比较好理解，这里就不做过多赘述，直接上代码

template<class K, class E>
class hashChains : public dictionary<K,E>
{
   public:
      hashChains(int theDivisor = 11)
      {
         divisor = theDivisor;
         dSize = 0;
      
         // allocate and initialize hash table array
         table = new sortedChain<K,E> [divisor];
      }

      ~hashChains(){delete [] table;}

      bool empty() const {return dSize == 0;}
      int size() const {return dSize;}

      pair<const K, E>* find(const K& theKey) const
         {return table[hash(theKey) % divisor].find(theKey);}

      void insert(const pair<const K, E>& thePair)
      {
         int homeBucket = (int) hash(thePair.first) % divisor;
         int homeSize = table[homeBucket].size();
         table[homeBucket].insert(thePair);
         if (table[homeBucket].size() > homeSize)
            dSize++;
      }

      void erase(const K& theKey)
         {table[hash(theKey) % divisor].erase(theKey);}

      void output(ostream& out) const
      {
         for (int i = 0; i < divisor; i++)
            if (table[i].size() == 0)
               cout << "NULL" << endl;
            else
               cout << table[i] << endl;
      }


   protected:
      sortedChain<K, E>* table;  // hash table
      hash<K> hash;              // maps type K to nonnegative integer
      int dSize;                 // number of elements in list
      int divisor;               // hash function divisor
};


// overload <<
template <class K, class E>
ostream& operator<<(ostream& out, const hashChains<K,E>& x)
   {x.output(out); return out;}

#STL中的关联容器

关联容器采用红黑树作为其底层数据结构，红黑树是平衡二叉排序树的一种。为了在插入和删除元素之后保持“平衡”，提高查找效率，红黑树通过给节点加上“颜色”作为标志并辅以相应的规则来动态调整树的结构。因此，关联容器中的元素需要通过其“关键字”进行查找和访问，而序列容器则依据元素在容器中的位置进行访问。

关联容器按照其自身的特点又分成八种不同的容器，其主要区别在于：

(1)仅包含关键字key还是包含键值对key-value，前者取名集合set，后者取名映射map；

(2)关键字key是否允许重复，允许重复的包含multi；

(3)是否按照hash函数映射的方式组织元素，是则加上unordered。

#集合set与多重集合multiset

set与multiset都定义在头文件<set>中，在程序中使用set集合需要引入相应的头文件。在构造set对象的过程中会自动按照关键字的大小初始化，若出现多个值相同的元素，则set只保留第一个元素，而multiset则可以同时保留多个相同值的元素。

int a[] = {5,3,9,7,5,8,4,3,1,2};
int size = sizeof(a)/sizeof(int);
multiset<int> s1;
for(int i =0;i<size;i++)
{
    s1.insert(a[i]); //插入元素
}
multiset<int,less<int>>s2(s1); //拷贝s1的元素到s2中以完成s2的初始化
multiset<int>s3(a,a+size);//利用数组a初始化s3
multiset<int,greater<int>> s4(a,a+size);//由高到低有序的s4
set<int> s5;//s5中不会有重复元素

下面介绍set和multiset的成员函数，默认列表名是a

*(find(num)):输出关键字num的对应元素

lower_bound(key):返回指向>=key的元素的迭代器(第一个)

upper_bound(key):返回指向>key的元素的迭代器(第一个)

可以用上面两种方法将其组合使用可以得到关键值=key的元素范围

equal_range(key)则将以pair的形式返回关键字=key的下界lower和上界upper迭代器，调用的时候用*a.equal_range(key).first和*a.equal_range(key).second

swap成员函数用于交换两个集合元素

count(key) 返回关键值=key的元素个数

#pair补充

首先，pair是定义在头文件<utility>中的一个模板类，有两个类型参数分别对应其内部的两个数据成员，其类型参数可以相同也可以不同，如

#include<utility>
pair<int,int> math_arts;
pair<std::string,int> name_count;
pair<std::string,vector<int>> name_score;

由此可见，要构成关联容器中的key-value键值对，刚好可以用pair的两个数据成员来对应，还可以依据其中的key值来对map容器中的元素进行组织和操作。

pair中的两个成员分别命名为first和second，这两个成员都是公有的，可以使用成员访问运算符来访问。pair的默认构造函数会依据成员类型进行值初始化，也可以使用make_pair函数来生成pair对象，如

pair<std::string,int> name_count{"rookie",0830};
name_count=make_pair("rookie",0830);
std::cout<<name_count.first<<"出现了"<<name_count.second<<"次"<<std::endl;

下面举一个例子

typedef multimap<int,std::string,less<int>> ML;
ML m1;
int s1[]={23,34,51,26};
std::string s2[]={"重庆","河南","北京","成都"};
for(int i=0;i<size;i++)
{
    m1.insert(make_pair(s1[i]),s2[i]);//插入生成map
}
ML m2(m1);
ML m3(m1.begin(),m1.end())

这里给出打印模板类的display方法，代码如下

template<class T1,class T2,class T3>
void display(string name,T1& m,T2 key_type,T3 val_type)
{
    cout<<name;
    multimap<T2,T3>:iterator it = m.begin();
    while(it!= m.end())
    {
        cout<<(*it).first<<","<<(*it).second<<endl;
        it++;
    }
}

例如display("m2:",m2,0,string()); //用于遍历输出键值对类型为int+string的容器m2元素

display("m2:",m2,string(),string());//用于遍历输出键值对类型为string+string的容器m2元素

#map和multimap成员函数

两个也像set集合一样具有诸如insert,lower_bound,upper_bound,equal_range等成员，不同之处在于，映射的容器元素为pair对象，在向映射容器中添加元素时需要先构成<key-value>对，然后再用insert成员插入；也可以通过映射的emplace成员函数直接插入而无需构成<key-value>对类型

typedef multimap<int,string> Map_L;
int a[]={80,60,70};
string s[]={"赵","钱","孙","李","赵","孙"};
Map_L mapL;
mapL.emplace(a[0],s[1]);//用emplace插入
display("容器:",mapL,0,string());
Map_L::iterator it = mapL.find(60);
cout<<"find(60)处的元素："<<(*it).first<<","<<(*it).second<<endl;
int num = mapL.count(60); //键值key=60的元素总和
mapL.erase(mapL.find(60)); //删除键值key=60的key-value（删除第一个）
int num2 = mapL.erase(60);//返回的是删除的元素的总和

#unordered_set与unordered_multiset

关键字unordered表示“无序”，在无序关联容器中，元素并不是按照其值的比较关系(有序)来进行组织和存储的，而是用一个哈希(hash)函数和关键字类型的==运算符来管理容器。在无序关联容器中，元素之间并没有任何的序关系，当向容器中插入元素时，通过计算关键字的哈希值将其映射到不同的“桶”(bucket)当中，每一个 “桶”可以保存一个或多个元素。如果容器允许重复关键字,那么具有相同关键字的元素也会映射到同一个桶中。

在无序关联容器中查找元素时，首先计算元素关键字的哈希值，找到其对应的“桶”。若“桶”中有不止一个元素，则在这些元素中按照顺序进行查找。由此看来，若桶中的元素较多，则需要大量的比较操作，影响查找性能。因此，无序容器的性能与哈希函数的选取以及桶的数量密切相关，哈希函数的质量越好，则能将元素更加均匀地映射到各个桶中，避免大量元素聚集在个别桶中。在理想情况下，每个元素会被映射到唯一的一个桶中。桶的数量越多，映射到同一个桶的元素个数相对就会越少，则能提供更好的查找性能:但另一方面，桶的数量越多，容器的空间利用率越低，因此需要在查找效率和空间利用率之间找到平衡点。，下面是无序容器管理函数

成员函数	作用
bucket_size(n)	返回编号为n的桶中元素的个数
bucket_count()	返回容器中桶的数量
bucket(key)	返回关键字key对应的桶编号
load_factor()	返回平均装填因子=元素数量/桶数量
max_load_factor()	返回最大装填因子。为保证load_factor<=max_load_factor，容器会在必要的时候增加桶数量
rehash(n)	重建hash表，使得bucket_count>=n
hash_function()	返回hash函数对象

注意，尽管unordered_multimap也能调用find，equal_range，erase等函数，但其的不同之处就在于其元素是通过其hash值进行映射的，因此值相同的元素彼此相邻但不同元素之间没有序关系。

无序容器与有序容器的差别还在于，无序容器仅支持前向迭代器(Forward Iterator)，因此无法实现迭代器的自减运算；而有序容器支持双向迭代器(Bidirectional Iterator)，支持迭代器自增和自减运算

#装填因子load_factor

装填因子反映了unordered_set 容器中元素填满的程度，装填因子越大，意味着填入hash表中的元素越多，空间利用率越高，但也意味着发生冲突的可能性也加大了;反之，装填因子越小，意味着填满的元素越少，发生冲突的可能性减少，但空间利用率也随之降低。在实际应用中，必须在“空间利用率”和“冲突的可能”之间找到一个性能折中。，unordered_set中的最大装填因子max_load_factor=1, 当容器中的元素增加时，C++会自动添加新的桶，保证其load_factor始终小于等于max_load_factor

注：插入的时候既可以用insert也可以用emplace，一般学习的时候用emplace能自己设置键值对。

今天字典初步学习就到这里吧，明天接着算法训练，争取坚持到底，把本儿挣回来！