看完《算法导论》的B书一节,写点读书笔记。
写在开头
《算法导论》开篇就讲了很多关于计算机存储的东西,主要提及了内存和硬盘的存储能力和读写速度的相关知识。意图我觉得大概是为了说明,内存存储量有限但是读取速度快,硬盘则相反,那么就需要一种较好的数据结构来组织这些数据或文件,接着便引出的B树相关概念。
1. B树的定义
1.每个结点Node有以下属性
a. n:即该点的关键字个数
b. Key[i], 一个存放关键字的数组,按升序排列,其中 0<= i < n(注:《算法导论》中写的是非降序,但是书中对具体算法的描述时并未处理关键字相同的情况,所以我在具体实现时就默认关键字各不相同,所以这里也是写明按升序排列)
c. leaf,一个bool值,说明该结点是否是叶子结点
d. Children[i] 指向该结点的孩子,若leaf == true 则这项无意义(这句话有点废话额...), 其中 0 <= i <= n, (也就是说孩子个数比关键字个数多一个,这个好理解)
2. B树的一些性质
a. 每个结点的关键字数值的间隔划分其孩子关键字的关键字数值的范围,即某一结点存在关键字Key[i] 和 Key[i+1] ,则这两个关键字所包围的孩子Children[i+1]的关键字范围就是(Key[i] , Key[i+1])
b. 每个结点的关键字个数有一个上下界,这里设定一个数值 t, 称作B树的最小度数。我们有 1)每个非根结点至少有t-1个关键字,最多有2t-1个关键字。相对的孩子数目最少为t,最大为2t。 2)根结点最少要包含一个关键字。
(书中还提到了树的高度计算 ,这里先省略了)
2. B树的基本操作
1. B树的搜索
搜索还是很简单的,对树有一定了解应该不难,直接看代码就能懂。如果对树不了解还是先从二叉树什么的开始看起吧。
上代码
template<class T>
T* BTree<T>::search(int k)
{
Node* node = root;
int i = 0;
while (node)
{
while (i < node->n && node->key[i]->key < k) i++;
if (i < node->n && k == node->key[i]->key)// i < node->n prevent i out of array bound
{
cout << "key" << i << endl;
return node->key[i];
}
if (node->leaf)
{
return 0;
}
else
{
node = node->children[i];
cout << "go to child" << i << endl;
i = 0;
}
}
return 0;
}
书中给的伪代码用了一个尾递归,所以具体实现的时候我就将尾递归消除了。这里只是个代码片段。
2. B树的插入
B树的插入稍微复杂,在插入的过程中不仅要考虑维持B树的性质,还要预防下次插入时遇到不可挽回情况的出现。
整个插入算法的流程我画了一个大致的流程图。
2 之前的操作都是在对根结点进行,因为当根结点为满的时候,我们需要加高树的高度来解决这一情况。
在延树下降的过程中,凡是遇到结点已满的情况,我们就需要对该结点进行分裂操作,而在分裂之后,我们需要将一个关键字升至父结点。由于之前遇到的满结点都已经被分裂,所以可以保证这个关键字不会上升到一个已满的父结点。 途中的1和6就是分裂操作。
分裂操作的方法代码如下:(写的时候觉得切换输入法好麻烦...一下脑洞大连注释也英文了...= =...英文也不怎么样凑合下吧)
template<class T>
int BTree<T>::splitFullNode(Node* parent, int pos, Node* child)
{
cout << "split child:" << child->keySerial() << endl;
Node* newNode = 0;
createNode(&newNode);
newNode->leaf = child->leaf;
child->n = t - 1;
//move the keys[t ~ 2t-2] from full child to new node.
for (int i = 0; i < t - 1; i++)
{
newNode->key[i] = child->key[i + t];
child->key[i + t] = 0;
}
newNode->n = t - 1;
//if the child is not leaf node, child's children should be splited too. children[t ~ 2t-1] become the children of the new node.
if (!child->leaf)
{
for (int i = 0; i < t; i++)
{
newNode->children[i] = child->children[i + t];
child->children[i + t] = 0;
}
}
//relocate the keys of parent to place key[t] of child in position key[pos] of parent
for (int i = parent->n; i > pos; i--)
{
parent->key[i] = parent->key[i - 1];
}
parent->key[pos] = child->key[t-1];
//add new node to parent as child in children[pos+1]
for (int i = parent->n + 1; i > pos + 1; i--)
{
parent->children[i] = parent->children[i - 1];
}
parent->children[pos + 1] = newNode;
//the number of key plus 1.
parent->n++;
return 0;
}
只要没遇到满员的结点,要做的就是选择合适的孩子进行插入的递归就好了。根据关键字数值范围的性质,很好判断合适的孩子是哪一个。
3. B树的删除
删除的代码是作为习题给出的,至于算法的描述,我将其总结为如下的流程图
流程图对算法的具体处理也做了说明。书中有些只说到合并,借来一个结点等大概念的表述,图中提及的具体从哪个结点融合到哪里算是我个人的补充。大体上可以看到,2,3和7,8都是互为对称的操作。
详细代码较长,下面就直接将所有代码一并贴上好了。
对于删除部分的代码,个人觉得智商有限,没有像插入那样抽象出一些可以重复使用的代码来。
3. 整个实现的代码
个人觉得觉得整个代码应该还是存在某些问题的,只是自己能力有限没能查出来。关于内存泄露方面也是直觉上觉得应该不会有了。在写代码的时候也觉得代码写出来测试起来也是一件麻烦事,Source.cpp 主函数中一些插入删除用例只是勉强的覆盖了算法的所有路径。
</pre><pre>
//FileName: BTree.h
//Desc: Define BTree Class
//Author: SwineX
//Date: 2014.9.3
#pragma once
#include <list>
using namespace std;
template<class T>
class BTree
{
public:
BTree();
BTree(int t);
private:
int t; //每个结点最多可有2t个子女,最少含有t个子女。如比较有名的2-3-4B树,其t值为2。
//树的每个节点
struct Node
{
T** key;
int n; //包含的关键字个数
Node** children;
bool leaf;
char* keySerial()
{
if (n == 0) return 0;
static char* serial = new char[n+1];
for (int i = 0; i < n; i++)
{
serial[i] = key[i]->value;
}
serial[n] = 0;
return serial;
}
Node() :key(0), n(0), children(0), leaf(false)
{
}
~Node()
{
}
};
Node* root; //树根
int createTree(Node** root);//创建一棵树
int createNode(Node** node);//创建一个空结点
int splitFullNode(Node* parent, int pos, Node* child);//分裂一个关键字个数达到2t-1的child结点,前t-1和后t-1分开,第t个关键字上升到parent的pos位置
int insertNonFull(Node* node, T* k); //插入一个不满的结点。
int deleteNode(Node* &node, int key);
public:
int insert(T* e);
T* search(int key);
int remove(int key);
int printTree();
};
//FileName: Source.cpp
//Desc: The entry of the program
//Author: SwineX
//Date: 2014.9.4
#include "BTree.h"
template<class T>
BTree<T>::BTree() : t(2)
{
createTree(&root);
}
template<class T>
BTree<T>::BTree(int _t) : t(_t)
{
createTree(&root);
}
template<class T>
int BTree<T>::createNode(Node** node)
{
(*node) = new Node;
(*node)->key = new T*[t-1];
(*node)->children = new Node*[t];
return 0;
}
template<class T>
int BTree<T>::insert(T* e)
{
//if the root is full
if (root->n == (2*t - 1) )
{
Node* newNode = 0;
createNode(&newNode);
newNode->children[0] = root;
splitFullNode(newNode, 0, root);
root = newNode;
}
insertNonFull(root, e);
return 0;
}
template<class T>
int BTree<T>::createTree(Node** root)
{
createNode(root);
(*root)->leaf = true;
return 0;
}
template<class T>
int BTree<T>::splitFullNode(Node* parent, int pos, Node* child)
{
cout << "split child:" << child->keySerial() << endl;
Node* newNode = 0;
createNode(&newNode);
newNode->leaf = child->leaf;
child->n = t - 1;
//move the keys[t ~ 2t-2] from full child to new node.
for (int i = 0; i < t - 1; i++)
{
newNode->key[i] = child->key[i + t];
child->key[i + t] = 0;
}
newNode->n = t - 1;
//if the child is not leaf node, child's children should be splited too. children[t ~ 2t-1] become the children of the new node.
if (!child->leaf)
{
for (int i = 0; i < t; i++)
{
newNode->children[i] = child->children[i + t];
child->children[i + t] = 0;
}
}
//relocate the keys of parent to place key[t] of child in position key[pos] of parent
for (int i = parent->n; i > pos; i--)
{
parent->key[i] = parent->key[i - 1];
}
parent->key[pos] = child->key[t-1];
//add new node to parent as child in children[pos+1]
for (int i = parent->n + 1; i > pos + 1; i--)
{
parent->children[i] = parent->children[i - 1];
}
parent->children[pos + 1] = newNode;
//the number of key plus 1.
parent->n++;
return 0;
}
template<class T>
int BTree<T>::insertNonFull(Node* node, T* e)
{
int i = node -> n ;
// if the node is leaf and it's not full obviously, just place the 'e'.
if (node->leaf)
{
while (i > 0 && e->key < node->key[i - 1]->key)
{
node->key[i] = node->key[i - 1];
i--;
}
node->key[i] = e;
node->n++;
}
else// if the node is not leaf. find the suitable child and insert e into it;
{
while (i > 0 && e->key < node->key[i - 1]->key) i--;
// the child i is the right child
// if the ith child is full
if (node->children[i]->n == (2*t - 1) )
{
splitFullNode(node, i, node->children[i]);
//after splited, a new key add to node and generate a new child. we should judge which child is suitable to be inserted.
if (e->key > node->key[i]->key)
{
i++;
}
insertNonFull(node->children[i], e);
}
else
{
insertNonFull(node->children[i], e);
}
}
return 0;
}
template<class T>
T* BTree<T>::search(int k)
{
Node* node = root;
int i = 0;
while (1)
{
while (i < node->n && node->key[i]->key < k) i++;
if (i < node->n && k == node->key[i]->key)// i < node->n prevent i out of array bound
{
cout << "key" << i << endl;
return node->key[i];
}
if (node->leaf)
{
return 0;
}
else
{
node = node->children[i];
cout << "go to child" << i << endl;
i = 0;
}
}
return 0;
}
template<class T>
int BTree<T>::deleteNode(Node* &node, int k)
{
int i = 0;
while (i < node->n && node->key[i]->key < k) i++;
if (i < node->n && node->key[i]->key == k) // find the target to delete
{
if (node->leaf) // if the node is leaf
{
//delete node->key[i]; // release the resource of this key
for (int j = i + 1; j < node->n; j++)
{
node->key[j - 1] = node->key[j];
}
node->n--;
node->key[node->n] = 0;
}
else // node is the inner node of the tree
{
Node* leftChild = node->children[i];
Node* rightChild = node->children[i + 1];
if (leftChild->n > t - 1) //leftChild has enough keys
{
// find the [前驱]
Node* fore = leftChild;
while (!fore->leaf) fore = fore->children[fore->n];
//exchange the key[i] with its [前驱], and change the key value of ex-key[i] and delete it
T* tmp = node->key[i];
node->key[i] = fore->key[fore->n - 1];
fore->key[fore->n - 1] = tmp;
return deleteNode(leftChild, k);
}
else if (rightChild->n > t - 1)
{
// find the [后继]
Node* after = rightChild;
while (!after->leaf) after = after->children[0];
//exchange the key[i] with its [后继], and change the key value of ex-key[i] and delete it
T* tmp = node->key[i];
node->key[i] = after->key[0];
after->key[0] = tmp;
return deleteNode(rightChild, after->key[0]->key);
}
else
{
// merge key[i] and right child into left child
leftChild->n += t;
leftChild->key[t-1] = node->key[i];
for (int j = i + 1; j < node->n; j++)// relocate current node
{
node->key[j - 1] = node->key[j];
node->children[j] = node->children[j + 1];
}
node->n--;
for (int j = 0; j < rightChild->n; j++)
{
leftChild->key[t + j] = rightChild->key[j];
}
if (!rightChild->leaf) // if it is leaf node,merge their children
{
for (int j = 0; j < rightChild->n + 1; j++)
{
leftChild->children[t + j] = rightChild->children[j];
}
}
delete rightChild; // release the source of rightChild
if (node->n == 0) // if node is empty, delete it and low down the height of the whole tree
{
delete node;
node = leftChild;
}
return deleteNode(leftChild, k);
}
}
}
else
{
if (node->leaf) // not found key, delete fail
{
return -1;
}
Node* child = node->children[i];
if (child->n > t - 1)
{
deleteNode(child, k);
}
else
{
Node* leftBro = (i > 0) ? node->children[i - 1] : 0;
Node* rightBro = (i < node->n) ? node->children[i + 1] : 0;
if (leftBro != 0 && leftBro->n > t - 1) // if child exist left bro and bro has at least t keys
{
//move the parent node key[i-1] to child
for (int j = 0; j < child->n; j++)
{
child->key[j + 1] = child->key[j];
}
child->key[0] = node->key[i-1];
// move the max key of leftBro to its parent node
node->key[i - 1] = leftBro->key[leftBro->n - 1];
leftBro->key[leftBro->n - 1] = 0;
if (!child->leaf) // if it is not leaf, the child of leftBro should be moved too
{
for (int j = 0; j < child->n + 1; j++)
{
child->children[j + 1] = child->children[j];
}
child->children[0] = leftBro->children[leftBro->n];
leftBro->children[leftBro->n] = 0;
}
//update key number info
leftBro->n--;
child->n++;
return deleteNode(child, k);
}
else if (rightBro != 0 && rightBro->n > t - 1) // The Symmetric structure of left bro
{
//move the parent node key[i] to child
child->key[child->n] = node->key[i];
// move the min key of rightBro to its parent node
node->key[i] = rightBro->key[0];
for (int j = 0; j < rightBro->n - 1; j++)
{
rightBro->key[j] = rightBro->key[j + 1];
}
rightBro->key[rightBro->n - 1] = 0;
if (!child->leaf) // if it is not leaf, the child of rightBro should be moved too
{
child->children[child->n + 1] = rightBro->children[0];
for (int j = 0; j < rightBro->n; j++)
{
rightBro->children[j] = rightBro->children[j+1];
}
rightBro->children[rightBro->n] = 0;
}
//update key number info
rightBro->n--;
child->n++;
return deleteNode(child, k);
}
else // when his left and right bro neither has more than t-1 node, merge the child with left or right
{
if (leftBro != 0) // if left Bro exist, merge it with child first. child -> left
{
// move the parent node to left Bra
leftBro->key[leftBro->n] = node->key[i - 1];
leftBro->n++;
// merge leftBro and the child
for (int j = 0; j < child->n; j++)
{
leftBro->key[leftBro->n + j] = child->key[j];
}
if (!leftBro->leaf) // if not leaf, children should be moved
{
for (int j = 0; j < child->n + 1; j++)
{
leftBro->children[leftBro->n + j] = child->children[j];
}
}
leftBro->n += child->n;
// relocate the parent node and its children
for (int j = i; j < node->n; j++)
{
node->key[i - 1] = node->key[i];
node->children[i] = node->children[i + 1];
}
node->n--;
if (node->n == 0) // node is empty and has only one child. let its child replace itself.
{
delete node;
node = leftBro;
}
else
{
node->children[node->n + 1] = 0;
}
delete child;
return deleteNode(leftBro, k);
}
else // if left node dont exist, merge child with the right right -> child
{
// move the parent node to right Bra
child->key[child->n] = node->key[i];
child->n++;
// merge rightBro and the child
for (int j = 0; j < rightBro->n; j++)
{
child->key[child->n + j] = rightBro->key[j];
}
if (!child->leaf) // if not leaf, children should be moved
{
for (int j = 0; j < rightBro->n + 1; j++)
{
child->children[child->n + j] = rightBro->children[j];
}
}
child->n += rightBro->n;
// relocate the parent node and its children
for (int j = i + 1; j < node->n; j++)
{
node->key[j - 1] = node->key[j];
node->children[j] = node->children[j + 1];
}
node->n--;
if (node->n == 0) // node is empty and has only one child. let its child replace itself.
{
delete node;
node = child;
}
else
{
node->children[node->n + 1] = 0;
}
delete rightBro;
return deleteNode(child, k);
}
}
}
}
return 0;
}
template<class T>
int BTree<T>::remove(int k)
{
return deleteNode(root, k);
}
template<class T>
int BTree<T>::printTree()
{
// define a simple list
list<Node*> nodes;
Node* emptyNode = 0;
nodes.push_back(root);
nodes.push_back(emptyNode);
while (!nodes.empty())
{
// pop a element
Node* node = nodes.front();
nodes.pop_front();
//print current node
if (node != emptyNode)
{
cout << "|";
for (int i = 0; i < node->n; i++)
{
cout << "|" << node->key[i]->key << "-" << node->key[i]->value<<"|";
}
cout << "| ";
if (!node->leaf)
{
for (int i = 0; i < node->n + 1; i++)
{
nodes.push_back(node->children[i]);
}
}
}
else
{
cout << endl;
if (nodes.empty()) return 0;
nodes.push_back(emptyNode);
}
}
return 0;
}
//FileName: Element.h
//Desc: Define Element stored in the BTree
//Author: SwineX
//Date: 2014.9.4
#pragma once
class Element
{
public:
Element(){}
Element(int _key, char _value)
{
key = _key;
value = _value;
}
int key;
char value;
};
//FileName: Source.cpp
//Desc: implementation of the BTree.h
//Author: SwineX
//Date: 2014.9.4
#include <iostream>
#include "BTree.cpp"
#include "Element.h"
using namespace std;
int main()
{
BTree<Element> bt(2);
for (int i = 0; i < 26; i = i+2)
{
Element* e = new Element;
e->key = i;
e->value = 'a' + i;
bt.insert(e);
}
for (int i = 1; i < 26; i = i+2)
{
Element* e = new Element;
e->key = i;
e->value = 'a' + i;
bt.insert(e);
}
for (int i = 25; i >=0 ; i = i-2)
{
cout << "---------"<<i<<"----------"<<endl;
Element* ep = bt.search(i);
if (ep == 0)
{
cout << "not found." << endl;
}
else
{
cout << ep->key << ":" << ep->value << endl;
}
bt.remove(i);
}
bt.printTree();
for (int i = 1; i < 26; i = i + 2)
{
Element* e = new Element;
e->key = i;
e->value = 'a' + i;
bt.insert(e);
}
bt.printTree();
for (int i = 24; i >= 0 ; i = i - 2)
{
cout << "---------"<<i<<"----------"<<endl;
Element* ep = bt.search(i);
if (ep == 0)
{
cout << "not found." << endl;
}
else
{
cout << ep->key << ":" << ep->value << endl;
}
bt.remove(i);
}
bt.printTree();
bt.remove(13);
bt.remove(17);
bt.remove(15);
bt.remove(9);
bt.printTree();
return 0;
}