2019.02.24
前言
想在业余时间做个小工具,设计是不使用数据库,而是用文件系统来存储数据。为了减少文件打开次数,提高索引效率,用B树构建内存索引。B的原理就不介绍了,提供如下链接供扩展阅读:
- Princeton算法课程slide:https://www.cs.princeton.edu/~rs/AlgsDS07/09BalancedTrees.pdf
- Princeton版本的B树Java实现(可直接使用,本文会添加配图说明和添加代码中文注释):
https://algs4.cs.princeton.edu/code/edu/princeton/cs/algs4/BTree.java.html - B树可视化:
https://www.cs.usfca.edu/~galles/visualization/BTree.html
网上很多博客都直接复制了Princeton版本的代码,但Princeton版源码生成的B树,叶子节点上都有哨兵,跟手绘出来的B树不太相同,所以撰写本文予以说明。同时,为了构建出与可视化出来更像的B树,我对Princeton版本源码略加修改,实现了自己的版本,在本文会有较多附图说明三者之间的区别。
B树例子
下面Java源码都会用如下插入顺序构建BTree:
Princeton版本源码理解
哨兵
Princeton版本的BTree实现,采用了哨兵,使得任意一个结点(包含key1, …, keyi, …, keym),keyi对应子结点内的所有key值都>=keyi,并且都小于key(i+1)。采用哨兵的数据结构有效地简化了代码。
对于上一节的B树例子,采用Princeton版本源码生成的B树如下图所示:
源码
public class PrincetonBTree<Key extends Comparable<Key>, Value> {
// max children per B-tree node = M-1
// (must be even and greater than 2)
private static final int M = 4;
private Node root; // root of the B-tree
private int height; // height of the B-tree
private int n; // number of key-value pairs in the B-tree
// helper B-tree node data type
private static final class Node {
private int m; // number of children
private Entry[] children = new Entry[M]; // the array of children
// create a node with k children
private Node(int k) {
m = k;
}
}
// internal nodes: only use key and next
// external nodes: only use key and value
// 因此,当索引key时,要获取value都只能在外部结点中获得
private static class Entry {
private Comparable key;
private final Object val;
private Node next; // helper field to iterate over array entries
public Entry(Comparable key, Object val, Node next) {
this.key = key;
this.val = val;
this.next = next;
}
}
/**
* Initializes an empty B-tree.
*/
public PrincetonBTree() {
root = new Node(0);
}
/**
* Returns true if this symbol table is empty.
* @return {@code true} if this symbol table is empty; {@code false} otherwise
*/
public boolean isEmpty() {
return size() == 0;
}
/**
* Returns the number of key-value pairs in this symbol table.
* @return the number of key-value pairs in this symbol table
*/
public int size() {
return n;
}
/**
* Returns the height of this B-tree (for debugging).
*
* @return the height of this B-tree
*/
public int height() {
return height;
}
/**
* Returns the value associated with the given key.
*
* @param key the key
* @return the value associated with the given key if the key is in the symbol table
* and {@code null} if the key is not in the symbol table
* @throws IllegalArgumentException if {@code key} is {@code null}
*/
public Value get(Key key) {
if (key == null) throw new IllegalArgumentException("argument to get() is null");
return search(root, key, height);
}
private Value search(Node x, Key key, int ht) {
Entry[] children = x.children;
// external node
if (ht == 0) {
for (int j = 0; j < x.m; j++) {
if (eq(key, children[j].key)) return (Value) children[j].val;
}
}
// internal node
else {
for (int j = 0; j < x.m; j++) {
if (j+1 == x.m || less(key, children[j+1].key))
return search(children[j].next, key, ht-1);
}
}
return null;
}
/**
* Inserts the key-value pair into the symbol table, overwriting the old value
* with the new value if the key is already in the symbol table.
* If the value is {@code null}, this effectively deletes the key from the symbol table.
*
* @param key the key
* @param val the value
* @throws IllegalArgumentException if {@code key} is {@code null}
*/
public void put(Key key, Value val) {
if (key == null) throw new IllegalArgumentException("argument key to put() is null");
Node node = insert(root, key, val, height);
n++;
if (node == null) return;
// need to split root
Node newRoot = new Node(2