C++进阶 | [4.4] 红黑树封装 map 和 set

畋坪

已于 2024-08-07 10:30:01 修改

阅读量944

点赞数 8

分类专栏： # C++进阶文章标签：开发语言 c++ 数据结构

于 2024-08-06 15:57:51 首次发布

本文链接：https://blog.csdn.net/fantastic_13_7/article/details/140773686

版权

C++进阶专栏收录该内容

11 篇文章 0 订阅

订阅专栏

摘要：实现红黑树的迭代器，用红黑树封装 map 和 set

Red/Black Tree Visualization (usfca.edu) 👈这是一个红黑树的可视化网站

1. 框架

红黑树 · 类模板参数解读：
```
template<class Key, class Type, class KeyofValue, ...>
class RBTree{...};
```
①Key：有些成员函数需要用到K值的类型（库里面的实现是有这个模版参数的，因为有些函数会需要用到这个数据类型，但本文所展示的模拟实现不会完整地实现所有成员函数，由于模拟实现中不需要用到这个 K 值的类型，所以下文中的实现不含这个模版参数
②Type：数据类型（例如，map对象中存储的数据类型为pair<K,V>；
③KeyofValue：某个类，用途：仿函数，用来去获取数据中的Key值。因为对于红黑树(BST)可能是Key模型或KV模型（详情参看“搜索二叉树的两种模型”一文），而默认是通过比较Key值进行插入。例如，如果红黑树的节点中存储的数据类型是一个 pair 类型，插入的时候要去比较数据大小，找到合适的位置插入，此时就需要取到 pair 类型的 first 成员变量的值去进行比较。

ps.关于类的模板参数是否有必要存在的关键：该类用不用得到这个Type. 例如，如果 class RBTree 中 find函数实现中需要用到 Key 这个数据类型(Type)，则就需要写上这个模板参数。如果用不到就不用写。

模拟实现的红黑树（整体框架）：

enum Colour
{
	RED, BLACK
};

template<class T>
struct RBTreeNode
{
	RBTreeNode(T data = T())
		:_data(data)
		, _pLeft(nullptr)
		, _pRight(nullptr)
		, _pParent(nullptr)
		, _col(RED)//默认新创建的节点都是红色的
	{}
	T _data;
	RBTreeNode<T>* _pLeft;
	RBTreeNode<T>* _pRight;
	RBTreeNode<T>* _pParent;
	Colour _col;
};


// T: 可能是键值对<key,value>
//    可能是一个key
// 不论节点中存储的是<key, value> || key, 都是按照key来进行比较的
// KeyOfValue: 提取data中的Key
template<class T, class KeyOfValue>
class RBTree
{
	typedef RBTreeNode<T> Node;
public:
	RBTree()
		: _size(0)
	{
		_pHead = new Node;
		_pHead->_pLeft = _pHead;
		_pHead->_pRight = _pHead;
		_pHead->_pParent = nullptr; //将头结点的pParent节点置空
	}

private:
	Node* _pHead;
	size_t _size; //节点个数
	KeyOfValue _GetKey;
};

set：RBTree< K, KeyofSet<K>>

namespace RoundBottle 
{
    template<class K>
    struct KeyofSet {...};

    template<class K>
    class set
    {
    private:
        RBTree<K,K,KeyofSet<K>> _rb;
    };
}

ps.如上，set 只用的到一个数据类型(K)，因此只需要写一个模板参数。

map：RBTree< pair<K,V>, KeyofMap<K>>

namespace RoundBottle
{
	template<class K, class V>
	struct KeyofMap
	{
		const K& operator()(const pair<K, V>& data)
		{
			return data.first;
		}
	};

	template<class K, class V>
	class map
	{
	private:
		RBTree<pair<K, V>, KeyofMap<K, V>> _rb;
	};
}

2. 对红黑树的 insert 函数进行修改

红黑树首先遵循搜索二叉树的规则进行插入，必然存在通过数据比较大小寻找插入位置，而所有的数据比较都需要通过仿函数去取得。因此，必须做出如下修改：

增加成员变量：KeyofValue 模板参数类型的对象（e.g.KeyOfValue _GetKey;
通过仿函数进行数据比较： _GetKey(data) 即 _GetKey.operator(data) . 这个通过自己去实现具体的仿函数来控制operator()函数的执行结果。

仿函数已经反复使用过了，不多赘述。另外，insert 函数的完整实现代码比较长，修改点也不多，就不在此处展开了。

3. 迭代器_iterator

template<class T>
struct RBTreeIterator
{
	typedef RBTreeNode<T> Node;
	typedef RBTreeIterator<T> Self;

	RBTreeIterator(Node* pNode)
		: _pNode(pNode)
	{}

	// 让迭代器具有类似指针的行为
	T& operator*();
	T* operator->();
	
	Self& operator++();
	Self operator++(int);
	
	Self& operator--();
	Self operator--(int);

	// 让迭代器可以比较
	bool operator!=(const Self& s)const;
	bool operator==(const Self& s)const;

private:
	Node* _pNode;
};

此处重点讲解operator++，operator--与之类似。其他操作符重载都很简单，不多赘述。

operator++()

红黑树是BST的一种，访问的逻辑为：左子树 → 根 → 右子树。最终是一个升序/降序的整体顺序。

首先，核心思路：“往右走”。从这个核心思路中可以提炼出关键步骤为找到该节点的“最近邻右节点”。

其次，将关键步骤分为三种情况：上右；下右；到达最右。
①如果该节点没有右子树。则顺着上级节点去找，满足如上图所说的条件“↙” (具体可看图中 0007 → 0009 这个例子的讲解) 的节点即为下一个要访问的节点；（上右
②如果该节点有右子树。则下一个节点为该节点右子树的最左节点；（下右
③上述两种情况都不符合，即该节点的上右方向和下右方向都没有节点了，即该节点的右边没有节点了，即该节点为红黑树的最后一个有效节点。

operator-- 相比于 operator++ 就是方向不同，即“往左走”。同样的，分为三种情况：上左；下左；到达最左。

代码示例：

Self& operator++()
{
	if (_pNode->_pRight)//如果右子树不为空 → 下右 → 右子树最左节点
	{
		Node* pCur = _pNode->_pRight;
		while (pCur->_pLeft)
		{
			pCur = pCur->_pLeft;
		}
		*this = Self(pCur);
		return *this;
	}
	else //如果右子树为空 → 上右 
	{
		Node* pParent = _pNode->_pParent;
		Node* pCur = _pNode;
		while (pParent && pParent->_pRight == pCur)//如果当前节点与上级节点是“右”的关系就不断向上找，并排除上级节点走到头节点的情况
		{
			pCur = pParent;
			pParent = pCur->_pParent;
		}

		//if (pParent == nullptr)//循环结束，可能是一直走到头结点都没找到“上右方向”的节点
		//{
		//	return Self(nullptr);
		//}
		//else//循环结束，可能是 pCur == pParent->_pLeft
		//{
		//	return Self(pParent);
		//}
		*this = Self(pParent);
		return *this;//如果pParent为空则最后会返回由空指针构造的迭代器对象
	}
}

迭代器完整代码：

template<class T>
struct RBTreeIterator
{
	typedef RBTreeNode<T> Node;
	typedef RBTreeIterator<T> Self;

	RBTreeIterator(Node* pNode)
		: _pNode(pNode)
	{}

	// 让迭代器具有类似指针的行为
	T& operator*()
	{
		return _pNode->_data;
	}
	T* operator->()
	{
		return &(_pNode->_data);
	}

	// 前置/后置++  
	Self& operator++()
	{
		if (_pNode->_pRight)//如果右子树不为空 → 下右 → 右子树最左节点
		{
			Node* pCur = _pNode->_pRight;
			while (pCur->_pLeft)
			{
				pCur = pCur->_pLeft;
			}
			*this = Self(pCur);
			return *this;
		}
		else //如果右子树为空 → 上右 
		{
			Node* pParent = _pNode->_pParent;
			Node* pCur = _pNode;
			while (pParent && pParent->_pRight == pCur)
				//如果当前节点与上级节点是“右”的关系就不断向上找，并排除上级节点走到头节点的情况
			{
				pCur = pParent;
				pParent = pCur->_pParent;
			}

			//if (pParent == nullptr)//循环结束，可能是一直走到头结点都没找到“上右方向”的节点
			//{
			//	return Self(nullptr);
			//}
			//else//循环结束，可能是 pCur == pParent->_pLeft
			//{
			//	return Self(pParent);
			//}
			*this = Self(pParent);
			return *this;//如果pParent为空则最后会返回由空指针构造的迭代器对象
		}
	}
	Self operator++(int)
	{
		Self retval = *this;
		operator++();//this->operator++();
		return retval;
	}
	// 前置/后置-- 
	Self& operator--()
	{
		if (_pNode->_pLeft)//如果left子树不为空 → 下left → left子树最right节点
		{
			Node* pCur = _pNode->_pLeft;
			while (pCur->_pRight)
			{
				pCur = pCur->_pRight;
			}
			*this = Self(pCur);
			return *this;;
		}
		else //如果left子树为空 → 上left
		{
			Node* pParent = _pNode->_pParent, *pCur = _pNode;
			while (pParent->_pLeft == pCur && pParent)
				//如果当前节点与上级节点是“left”的关系就不断向上找，并排除上级节点走到头节点的情况
			{
				pCur = pParent;
				pParent = pCur->_pParent;
			}

			//if (pParent == nullptr)//循环结束，可能是一直走到头结点都没找到“上left方向”的节点
			//{
			//	return Self(nullptr);
			//}
			//else//循环结束，可能是 pCur == pParent->_pRight
			//{
			//	return Self(pParent);
			//}

			*this = Self(pParent);//如果pParent为空则最后会返回由空指针构造的迭代器对象
			return *this;
		}
	}
	Self operator--(int)
	{
		Self retval = *this;
		operator--();
		return retval;
	}

	// 让迭代器可以比较
	bool operator!=(const Self& s)const
	{
		return _pNode != s._pNode;
	}
	bool operator==(const Self& s)const
	{
		return _pNode == s._pNode;
	}

private:
	Node* _pNode;
};

4. 封装 set 和 map

1）迭代器封装

通过成员变量去复用红黑树的(成员)函数进行封装。

RBTree:

typedef RBTreeIterator<T> iterator;
iterator Begin() {...};
iteraotr End() {...};

set:

typedef typename RBTree<K, KeyofSet<K>>::iterator iterator;
iterator begin() { return _rb.Begin(); }
iterator end() { return _rb.End(); }

map:

typedef typename RBTree<pair<const K, V>, KeyofMap<K, V>>::iterator iterator;
iterator begin() { return _rb.Begin(); }
iterator end() { return _rb.End(); }

*注：typename

typename 这个关键词，在关于 list的模拟实现 一文中就已经用到过。

作用：明确标识某个标识符是一个类型，以避免编译器的误解。

例如：如下段代码，RBTree<K>::interator 依赖于模板参数 K . 只有在具体实例化模板 K，确定了的具体类型之后，才能明确 RBTree<K>::interator 到底是不是一个类型。譬如，如果 K 最终被实例化为 int ，那么编译器实例化之后，去 RBTree<int> “找” interator，才能确认这是一个类型。使用 typename 是为了告诉编译器，无论 K 具体是什么类型， RBTree<K>::interator 在这里应该被理解为一个类型，而不是其他可能的东西（比如成员变量或函数等）。

template<class K>
class set{
    typedef typename RBTree<K>::iterator iterator;
};

2）封装实现map的operator[]

首先，要实现 operator[] 的功能，就要去复用 insert 函数，这就要求 insert 函数的返回一个 pair<iterator, bool> 的类型。

因此，先对红黑树的 insert 函数进行修改：
①更改函数返回值类型 pair<iterator, bool>；
②函数体内：return make_pair(xxx, xxx);

基于上述操作，最终可以很容易的实现 map 的 operator[] 重载，代码如下：

namespace RoundBottle
{
	template<class K, class V>
	struct KeyofMap
	{
		const K& operator()(const pair<K, V>& data) { return data.first; }
	};

	template<class K, class V>
	class map
	{
	public:
		typedef typename RBTree<pair<K, V>, KeyofMap<K, V>>::iterator iterator;

		iterator begin() { return _rb.Begin(); }
		iterator end() { return _rb.End(); }

		pair<iterator, bool> insert(const pair<K, V>& data)
		{
			return _rb.Insert(data);
		}

		V& operator[](const K& key)
		{
			pair<iterator, bool> ret = insert(make_pair(key, V()));
			return (ret.first)->second;
		}

	private:
		RBTree<pair<K, V>, KeyofMap<K, V>> _rb;
	};
}

5. Key 值可被修改的问题

按规定，set 和 map 对象中的 Key 值都是不可以被修改的，但基于上述迭代器的实现，Key 值可以被修改。为了解决这一问题，需要进一步优化迭代器的实现。

首先，我们需要实现红黑树的 const_iterator （注意：再次提醒 const_iterator 不是 const 修饰的 iterator，const 真正修饰的是 T，T 是一个模板参数，代指一个数据类型，代指红黑树每个节点中所存储的数据的类型！）

则有：RBtreeIterator<T> → RBtreeIterator<T, T&, T*> and RBtreeIterator<T, const T&,const T*>

即增加 class RBtreeIterator 的模板参数，并相应地修改 opreaotr*() 与 operator->() 的返回值类型（这个操作在以往模拟实现的const迭代器中已经讲解过，此处简要带过）

则 红黑树：

typedef RBtreeIterator<T, T&, T*> iterator;
typedef RBtreeIterator<const T, const T&, const T*> const_iterator;

iterator Begin() {...}
iterator End() {...}
const_iterator Begin() const {...}
const_iterator End() const {...}

⬆️🟠注意：同一作用域，函数名相同，满足一定条件即构成函数重载。注意仅函数返回值类型不同不能构成函数重载。成员函数有隐藏的参数——this指针。上述写法中函数参数类型不同而构成函数重载。如果没有 const 修饰 this指针，上述函数是不能构成函数重载的！（或者要么就用不同的函数名，例如用 Begin() 和 cBegin()

则 set：如下代码示例，可以选择仅使用 const_iterator 来实现 set 的迭代器。这样就可以保护 key 值不被修改。

namespace RoundBottle
{
	template<class K>
	struct KeyofSet
	{
		const K& operator()(const K& data) { return data; }
	};

	template<class K>
	class set
	{
	public:
		typedef typename RBTree<K, KeyofSet<K>>::const_iterator iterator;
		iterator begin()const { return _rb.Begin(); }
		iterator end()const { return _rb.End(); }

		pair<iterator, bool> insert(const K& data) { return _rb.Insert(data); }

	private:
		RBTree<K, KeyofSet<K>> _rb;
	};
}

* 问题：insert 函数返回值类型不匹配

上述代码会带来一个问题：insert 函数返回值类型不匹配。
如上，set 的 insert 函数是复用的红黑树的插入函数，红黑树的 Insert 函数的返回值是 pair<iterator, bool> → pair<RBTreeIterator<T, T&, T*>, bool>，而 set 的 insert 函数的返回值为 pair<iterator, bool> → pair<RBTree<K, KeyofSet<K>>::const_iterator>, bool> → pair<RBTreeIterator<const T, const T&, const T*>, bool>，而RBTreeIterator<T, T&, T*> 和 RBTreeIterator<const T, const T&, const T*>是两个完全不同的类型，例如 vector<int> 和 vector<string>，对于类模板，传不同的模板参数就会实例化出不同的类型。因此，这两个完全不同类型类型之间是无法转换的。

解决方案：将红黑树的 Insert 函数的返回值类型更改为 pair<Node*, bool>
为什么 pair<Node*, bool> → pair<RBTreeIterator<const T, const T&, const T*>, bool> 这个类型转化完成呢？
答：因为 pair 的构造函数中重载了类模板的拷贝构造函数。

template<class U, class V> pair (const pair<U,V>& pr);

即相当于👇 如下代码。如果 U 类型的数据可给 T1 类型复制，V 类型的数据可以给 T2 赋值，则可以通过拷贝构造完成“类型转化”。针对上述问题，pair<Node*, bool> → pair<RBTreeIterator<const T, const T&, const T*>, bool> 即 U 为 Node*，V 为 bool；T1 为 RBTreeIterator<const T, const T&, const T*>，T2 为 bool. 因此，关键在于 Node* → RBTreeIterator<const T, const T&, const T*> 这个转化是可以实现的。（具体可去看 class RBTreeIterator 的实现

tempalte<class T1, class T2>
class pair
{
    template<class U, class V> 
    pair (const pair<U,V>& pr)
        :first(pr.first), second(pr.second)
    {...}
private:
    T1 first;
    T2 second;
}

map：map的处理就比较简单了，直接将 RBTree<pair<K, V>, KeyofMap<K, V>> _rb 改为 RBTree<pair<const K, V>, KeyofMap<K, V>> _rb;

typedef typename RBTree<pair<const K, V>, KeyofMap<K, V>>::iterator iterator;
typedef typename RBTree<pair<const K, V>, KeyofMap<K, V>>::const_iterator const_iterator;

iterator begin() { return _rb.Begin(); }
iterator end() { return _rb.End(); }

const_iterator begin()const { return _rb.Begin(); }
const_iterator end()const { return _rb.End(); }

RBTree<pair<const K, V>, KeyofMap<K, V>> _rb;

综上，基本的封装就完成了。

END