Introduction
- Indexing is a process of associating a key with the location of a corresponding data record(类似于查字典). Each key is associated with a pointer to a complete record
- Might have several associated index files(一个指针可以指向多条记录)
- Index Types
- Primary Key
- A unique identifier for records
- Not meaningful
- Secondary Key
- Alternate search key
- People may be more interested in it
- e.g. Score
- Often not unique for each record
- Frequently use
- Alternate search key
- Primary Key
Linear Index
- Index file organized as a simple sequence of key/record pointer pairs with key values are in sorted order
- Second-level index can be applied to secondary key search (二级检索)
二级索引:叶子节点中存储主键值,每次查找数据时,根据索引找到叶子节点中的主键值,根据主键值再到聚簇索引中得到完整的一行记录。
- Drawbacks
- The index update due to record inserting/deleting is expensive
- 删除一个后面的都要往前移
- Records with the same (secondary) key will waste memory space意思是第二行两个“Guangzhou”没必要
- Solution1
- A drawback to this approach is that the array must be of fixed size (Memory Wasting)
- Solution2(Inverted Index) : each secondary key has its own array list
- Solution3 : Combining the lists into one array based list
- Solution1
- The index update due to record inserting/deleting is expensive
- Drawbacks
Tree Index
2-3 Tree
- Characters
- A node contains one or two keys (一个
key
–两个小孩,两个key
–三个小孩) - Every internal node has either
- Two children (if it contains one key (left))
- Three children (if it contains two keys)
- All leaves are at the same level in the tree, so the tree is always height balanced
- 左边的孩子比
root
小,中间的值在区间内,右边的值比root
大
- A node contains one or two keys (一个
- ADT
- template
template <class Elem> class TTNode { // 2-3 tree node structure public: Elem lkey; // The node's left key Elem rkey; // The node's right key TTNode* left; // Pointer to left child TTNode* center; // Pointer to middle child TTNode* right; // Pointer to right child TTNode() { center = left = right = NULL; lkey = rkey = EMPTY; } ~TTNode() { } bool isLeaf() { return left == NULL; } };
- Find
template <class Key, class Elem, class KEComp, class EEComp> bool TTTree<Key, Elem, KEComp, EEComp>:: findhelp(TTNode<Elem>* subroot, const Key& K, Elem& e) const { //检查是否为空 if (subroot == NULL) // val not found return false; //检查左右键值是否相等 if (KEComp::eq(K, subroot->lkey)) { e = subroot->lkey; //用来承接返回值 return true; } if ((subroot->rkey != EMPTY) && (KEComp::eq(K, subroot->rkey))) { e = subroot->rkey; return true; } //递归检查子节点 if (KEComp::lt(K, subroot->lkey)) // Search left return findhelp(subroot->left, K, e); else if (subroot->rkey == EMPTY) // Search center return findhelp(subroot->center, K, e); else if (KEComp::lt(K, subroot->rkey)) // Search center return findhelp(subroot->center, K, e); else return findhelp(subroot->right, K, e); // Right }
- insert
- 注意这里有一个
promotion
的操作,当要插入的地方已经满了的时候,就把几个孩子按顺序排列后的中间那个放到父节点中去,剩下两边分裂成两个孩子,如果父节点满了也是相同的操作,直至插入为止。
- 注意这里有一个
bool insert(const Elem& e) { // Insert node with value val Elem retval; // Smallest value in newly created node TTNode<Elem>* retptr = NULL; // Newly created node bool isSuccess = inserthelp(root, e, retval, retptr); if (retptr != NULL) { // Root overflowed: make new root TTNode<Elem>* temp = new TTNode<Elem>; temp->lval = retval; temp->left = root; temp->center = retptr; root = temp; root->keycount = 1; } return isSuccess; } template <class Key, class Elem, class KEComp, class EEComp> bool TTTree<Key, Elem, KEComp, EEComp>:: inserthelp(TTNode<Elem>*& subroot, const Elem& e, Elem& retval, TTNode<Elem>*& retptr) { Elem myretv; TTNode<Elem>* myretp = NULL; //对于一个空树 新建的节点放左边 if (subroot == NULL) { // Empty tree: make new node subroot = new TTNode<Elem>(); subroot->lkey = e; } else //插入位置是叶子结点 if (subroot->isLeaf()) // At leaf node: insert here //不是满的(右边节点是空) if (subroot->rkey == EMPTY) { // Easy case: not full if (EEComp::gt(e, subroot->lkey)) subroot->rkey = e; else { subroot->rkey = subroot->lkey; subroot->lkey = e; } } //是满的 else // if full splitnode(subroot, e, NULL, retval, retptr); //插入位置是中间结点 就要一直找到叶子结点再插入 else if (EEComp::lt(e, subroot->lkey)) // Find a correct position inserthelp(subroot->left, e, myretv, myretp); else if ((subroot->rkey == EMPTY) || (EEComp::lt(e, subroot->rkey))) inserthelp(subroot->center, e, myretv, myretp); else inserthelp(subroot->right, e, myretv, myretp); /* inptr : input value tval : return value retptr : return pointer */ template <class Key, class Elem, class KEComp, class EEComp> bool TTTree<Key, Elem, KEComp, EEComp>:: splitnode(TTNode<Elem>* subroot, const Elem& inval,TTNode<Elem>* inptr, Elem& retval, TTNode<Elem>*& retptr) { retptr = new TTNode<Elem>(); // Node created by split //插在最左边 if (EEComp::lt(inval, subroot->lkey)) { // Add at left retval = subroot->lkey; //这是被promote节点的值 作为返回值 subroot->lkey = inval; //把插入值当成左孩子 retptr->lkey = subroot->rkey; //把右值作为promote节点的右孩子 retptr->left = subroot->center; retptr->center = subroot->right; subroot->center = inptr; } //插在中间 else if (EEComp::lt(inval, subroot->rkey)) { // Center retval = inval; retptr->lkey = subroot->rkey; retptr->left = inptr; retptr->center = subroot->right; } //插在右边 else { // Add at right retval = subroot->rkey; retptr->lkey = inval; retptr->left = subroot->right; retptr->center = inptr; } subroot->rkey = EMPTY; }
B-Tree
- A generalization of the 2-3 Tree (order three)
- A B-Tree of order b has these properties:
- The root is either a leaf or has at least two children
- Each internal node has between
ceil(b/2)
andb
children 注意这里是对于中间结点的 - All leaves are at the same level in the tree, so the tree is always height balanced
- 这里观察到
order
是一个root
可以有的最多孩子数 实际上的格子数要比order
小一个
- Example
B±Tree
-
Characteristic
- Internal nodes do not store record 中间结点只是一个索引分类的作用 实际上储存的值都在
leaf
里- Only key values to guild the search
- Leaf nodes store records or pointers to records
- Leaf node, except the root, should store between
ceil(t/2)
andt
values,t
is the maximum number of values can be stored 注意这里和上面的不同 上面的是对于结点的孩子数而言的 而这个是对结点储存的值而言的 叶子结点能储存的最多数量和order+1
数量一样 最少储存order
的一半
B+ tree of order four
• Internal Node: 1 – 4
• Leaf Node: Max 5 indexes- Leaf node may store more or less records than an internal node stores keys
- Internal nodes do not store record 中间结点只是一个索引分类的作用 实际上储存的值都在
-
Insert
- 当要做promote操作但叶子节点为偶数时,选择中间偏右的结点promote.
- 相同的数放在该节点的右边.
- 对于internal node 和 leaf node插入的原则不一样。internal node同上,而leaf node 要copy一份数据值留在下面。
-
B+ Trees nodes are always at least half full.
-
Space Analysis
Example: Consider a B±Tree of order 100 with leaf nodes containing 100 records.
就是要算一个叶子节点能储存100个值的B+树 最少/最多能储存多少值
Tips : Min. number of records in N level must be smaller than or equal to Max. number of records in N-1 level.
- 1st level B+tree :
Min: 1
Max: 100- 2nd level B+tree :
Min : 2 leaves of 50 for 100 records
Max: 100 leaves with 100 for 10,000 records- 3rd level B+tree :
Min : 100 leaves of 50 for 5000 records
Max: 10000 leaves with 100 for 1000,000 records- 4th level B+tree :
Min : 100 leaves of 50 for 5000 records
Max: 10000 leaves with 100 for 1000,000 records
最小的一直是按50个(最大数量的一半)算,最大的一直是100个