Project B+tree
(文末有github链接)
因为这是第一个涉及到代码库的pro,所以看懂代码库的其他代码实现是最重要的。
首先,我们知道,数据库的数据是存在磁盘中的,数据库中的表是以行为单位存储,在表中他们看上去是一行接一行的,但是在磁盘中却不一定是连续的,因为磁盘中的数据是以页为单位存储的,而不同的行可能存放在没关系的两张页表中。而我们的B+Tree是对表中的数据进行索引,能快速的找到数据所在的磁盘位置。
所以了解B+树是做这个pro的前提。
Task 1: 实现index/LeafNode中的fromBytes,也就是从磁盘中读取叶节点
B+树的叶节点跟内部节点不同,叶子节点需要记录key值,对应的rids(包含页表id和偏移量),下一个叶子节点rightSibling(用来实现B+树的范围查询)
/**
* Loads a leaf node from page `pageNum`.
*/
public static LeafNode fromBytes(BPlusTreeMetadata metadata, BufferManager bufferManager,
LockContext treeContext, long pageNum) {
// TODO(proj2): implement
// Note: LeafNode has two constructors. To implement fromBytes be sure to
// use the constructor that reuses an existing page instead of fetching a
// brand new one.
//Get the page corresponding to the current node
Page page = bufferManager.fetchPage(treeContext, pageNum);
Buffer buffer = page.getBuffer();
int index = 0;
byte isLeaf = buffer.get(index);
if(isLeaf == 1) {
index++;
Optional<Long> rightSibling = Optional.of(buffer.getLong(index));
index += Long.BYTES;
int nodeSize = buffer.getInt(index);
index += Integer.BYTES;
// start to read keys and rids
int keySize = metadata.getKeySchema().getSizeInBytes();
int ridSize = RecordId.getSizeInBytes();
List<DataBox> keys = new ArrayList<>();
List<RecordId> rids = new ArrayList<>();
buffer.position(index);
byte[] keysAndRidsByteArray = new byte[nodeSize * (keySize + ridSize)];
buffer.get(keysAndRidsByteArray);
// reset pointer of byte array
index = 0;
for(int i=0; i < nodeSize; i++) {
keys.add(DataBox.fromBytes(edu.berkeley.cs186.database.common.ByteBuffer.wrap(keysAndRidsByteArray, index, keySize), metadata.getKeySchema()));
index += keySize;
rids.add(RecordId.fromBytes(edu.berkeley.cs186.database.common.ByteBuffer.wrap(keysAndRidsByteArray, index, ridSize)));
index += ridSize;
}
return new LeafNode(metadata, bufferManager, page, keys, rids, rightSibling, treeContext);
}
return null;
}
Task 2: 实现B+Tree的get函数,这个函数get返回的是叶子结点,所以对于InnerNode和LeafNode有不同的实现,但是返回的类型都是LeafNode
InnerNode:
@Override
public LeafNode get(DataBox key) {
// TODO(proj2): implement
// judge is there a child node
if(keys.size() > 0) {
// find next child by numLessThanEqual(can replace by binary search)
int targetChildrenIndex = numLessThanEqual(key, keys);
return getChild(targetChildrenIndex).get(key);
}
return null;
}
LeafNode:
@Override
public LeafNode get(DataBox key) {
// TODO(proj2): implement
return this;
}
Task 3: 实现getLeftmostLeaf函数,因为B+Tree在叶子节点是以链表的形式连接起来的,所以我们如果要遍历所有叶子节点的话可以获取一个链表的Head节点
一个叶子节点的内部也有好几个key值,要得到一个叶子节点的所以key值,我们需要得到一个叶子节点的Head节点
详细的讲解都在BPlusNode函数中,有图解
InnerNode:
@Override
public LeafNode getLeftmostLeaf() {
assert(children.size() > 0);
// TODO(proj2): implement
return getChild(0).getLeftmostLeaf();
}
LeafNode:
@Override
public LeafNode getLeftmostLeaf() {
// TODO(proj2): implement
return this;
}
Task3:实现put函数,为node.put(k, r),也就是将k对应的数据r插入以node为根节点的B+tree,那么这里就涉及到B+tree节点的overflow了,也就是说作为一个M阶B树,这个M是用来规定每个节点的子节点不能超过M个,并且当前节点至少要有M/2个节点,所以插入的时候会有两种情况,一种是overflow那么就要处理节点分裂的情况,另一种是可以直接插入
InnerNode:
// See BPlusNode.put.
@Override
public Optional<Pair<DataBox, Long>> put(DataBox key, RecordId rid) {
// TODO(proj2): implement
// get next node from children node by scan from the minimal node
int nextNodeIndex = numLessThanEqual(key, keys);
BPlusNode childNode = getChild(nextNodeIndex);
// insert node, this is a recursion
Optional<Pair<DataBox, Long>> curNode = childNode.put(key, rid);
if (curNode.isPresent()){
// after insert, if child node is split, so this node should add
// key from child node
// find the place for added child node
DataBox curKey = curNode.get().getFirst();
Long curChild = curNode.get().getSecond();
int targetIndex = 0;
for (; targetIndex < keys.size(); targetIndex++){
if (curKey.compareTo(keys.get(targetIndex)) < 0){
break;
}
}
keys.add(targetIndex, curKey);
children.add(targetIndex + 1, curChild);
if (keys.size() > 2 * metadata.getOrder()) {
// after adding, this node need to be split
// return half key, this is the middle key
DataBox retKey = keys.get(metadata.getOrder());
// create another node with keys half behind
List<DataBox> nKeys = new ArrayList<>();
List<Long> nChildren = new ArrayList<>();
for (int i = metadata.getOrder() + 1; i < keys.size(); ++i){
nKeys.add(keys.get(i));
nChildren.add(children.get(i));
}
// split created node get one more keys
nChildren.add(children.get(2 * metadata.getOrder() + 1));
// delete keys add to another node, only contain half before
for (int i = 0; i <= metadata.getOrder() + 1; ++i){
keys.remove(metadata.getOrder());
children.remove(metadata.getOrder() + 1);
}
InnerNode nInnerNode = new InnerNode(metadata, bufferManager, nKeys, nChildren, treeContext);
sync();
nInnerNode.sync();
return Optional.of(new Pair<>(retKey, nInnerNode.page.getPageNum()));
}
}
sync();
return Optional.empty();
}
LeafNode:
// See BPlusNode.put.
@Override
public Optional<Pair<DataBox, Long>> put(DataBox key, RecordId rid) {
// TODO(proj2): implement
if (keys.isEmpty()){
keys.add(key);
rids.add(rid);
sync();
return Optional.empty();
}else{
int insertIndex = 0;
for (; insertIndex < keys.size(); ++insertIndex){
if (key.equals(keys.get(insertIndex))){
// find duplicate key
throw new BPlusTreeException("found duplicate key in leaf node");
}else if (key.compareTo(keys.get(insertIndex)) < 0){
// insert place at right node
break;
}
}
keys.add(insertIndex, key);
rids.add(insertIndex, rid);
if (keys.size() <= 2 * metadata.getOrder()){
sync();
return Optional.empty();
}else {
// overflow need to split
List<DataBox> nKeys = new ArrayList<>();
List<RecordId> nRids = new ArrayList<>();
// move d+1 pairs to new leaf node right
int beginIndex = metadata.getOrder();
int pointer = beginIndex;
for (; pointer < keys.size(); ++pointer){
nKeys.add(keys.get(pointer));
nRids.add(rids.get(pointer));
}
// delete d+1 pairs from current node
for (int i = 0; i < beginIndex + 1; ++i){
keys.remove(beginIndex);
rids.remove(beginIndex);
}
// new right sibling
LeafNode nNode = new LeafNode(metadata, bufferManager,nKeys, nRids,rightSibling, treeContext);
// link current node and right sibling
rightSibling = Optional.of(nNode.page.getPageNum());
// persist current node and new node to disk
sync();
nNode.sync();
return Optional.of(new Pair<>(nKeys.get(0), nNode.page.getPageNum()));
}
}
}
Task:node.remove(key)删除某个键值,对于LeafNode直接找到该值删除即可,对于InnerNode,需要找到该值所在的LeafNode,然后进行删除
InnerNode:
// See BPlusNode.remove.
@Override
public void remove(DataBox key) {
// TODO(proj2): implement
// find the key in child node, return the leafNode
LeafNode targetChild = get(key);
targetChild.remove(key);
}
LeafNode:
// See BPlusNode.remove.
@Override
public void remove(DataBox key) {
// TODO(proj2): implement
if (!keys.isEmpty()){
for (int i = 0; i < keys.size(); ++i){
if (keys.get(i).equals(key)){
keys.remove(i);
rids.remove(i);
sync();
return;
}
}
}
}
Task5:实现scanAll和scanGreaterEqual函数,在BPlusTree类中
对于B+Tree我们知道叶子节点是所有数据的集合,并且用链表链接起来,所以我们要sacnAll只需要返回最左边的节点就可以了
scanGreatEqual就是扫描出比输入key大的所有数据,因为链表是顺序的,所以也很简单
ScanAll:
public Iterator<RecordId> scanAll() {
// TODO(proj4_integration): Update the following line
LockUtil.ensureSufficientLockHeld(lockContext, LockType.NL);
// TODO(proj2): Return a BPlusTreeIterator.
if (root != null){
LeafNode leafMostLeafNode = root.getLeftmostLeaf();
return new BPlusTreeIterator(leafMostLeafNode, -1);
}
return Collections.emptyIterator();
}
scanGreatEqual:
public Iterator<RecordId> scanGreaterEqual(DataBox key) {
typecheck(key);
// TODO(proj4_integration): Update the following line
LockUtil.ensureSufficientLockHeld(lockContext, LockType.NL);
// TODO(proj2): Return a BPlusTreeIterator.
if (root != null){
LeafNode leafNode = root.get(key);
List<DataBox> keys = leafNode.getKeys();
int targetIndex = 0;
for (; targetIndex < keys.size(); ++targetIndex){
if (key.compareTo(keys.get(targetIndex)) < 1){
break;
}
}
return new BPlusTreeIterator(leafNode, targetIndex - 1);
}
return Collections.emptyIterator();
}
Task6:实现批量加载node.bulkLoad(data, fillFactor),也就是对put函数的升级版,这个输入包含批量的数据data和一个0<fillfactor<1参数。对于InnerNode和LeafNode也有不同,对于overflow的大小m=2d,LeafNoded在这个过程中m=mfillFactor,而InnerNode不变
调用bulkLoad函数的时候,一般树是默认为空的,所以可以按顺序插入
所以对于InnerNode每次插入的时候都找到当前节点最右边的key进行插入
InnerNode:
// See BPlusNode.bulkLoad.
@Override
public Optional<Pair<DataBox, Long>> bulkLoad(Iterator<Pair<DataBox, RecordId>> data,
float fillFactor) {
// TODO(proj2): implement
if (children.size() > 1){
BPlusNode rightMostChild = getChild(children.size() - 1);
Optional<Pair<DataBox, Long>> nChild = rightMostChild.bulkLoad(data, fillFactor);
while (nChild.isPresent()){
keys.add(nChild.get().getFirst());
children.add(nChild.get().getSecond());
if (keys.size() > 2 * metadata.getOrder()){
DataBox retKey = keys.get(metadata.getOrder());
List<DataBox> nKeys = new ArrayList<>();
List<Long> nChildren = new ArrayList<>();
for (int i = metadata.getOrder() + 1; i < keys.size(); ++i){
nKeys.add(keys.get(i));
nChildren.add(children.get(i));
}
nChildren.add(children.get(2*metadata.getOrder() + 1));
for (int i = 0; i < metadata.getOrder() + 1; ++i){
keys.remove(metadata.getOrder());
children.remove(metadata.getOrder());
}
InnerNode nInnerNode = new InnerNode(metadata, bufferManager, nKeys, nChildren, treeContext);
sync();
nInnerNode.sync();
return Optional.of(new Pair<>(retKey, nInnerNode.page.getPageNum()));
}
rightMostChild = getChild(children.size() - 1);
nChild = rightMostChild.bulkLoad(data, fillFactor);
}
}
sync();
return Optional.empty();
}
LeafNode:
// See BPlusNode.bulkLoad.
@Override
public Optional<Pair<DataBox, Long>> bulkLoad(Iterator<Pair<DataBox, RecordId>> data,
float fillFactor) {
// TODO(proj2): implement
int maxCapacity = (int) Math.ceil(2 * metadata.getOrder() * fillFactor);
while (data.hasNext()){
Pair<DataBox, RecordId> nextPair = data.next();
keys.add(nextPair.getFirst());
rids.add(nextPair.getSecond());
if (keys.size() >= maxCapacity){
break;
}
}
if (data.hasNext()){
// if current node overflow, create next leafNode with one recordId, and return
// so other data will fill with new node
List<DataBox> nKeys = new ArrayList<>();
List<RecordId> nRids = new ArrayList<>();
Pair<DataBox, RecordId> nextPair = data.next();
nKeys.add(nextPair.getFirst());
nRids.add(nextPair.getSecond());
LeafNode nLeafNode = new LeafNode(metadata, bufferManager, nKeys, nRids, Optional.empty(),treeContext);
rightSibling = Optional.of(nLeafNode.page.getPageNum());
sync();
nLeafNode.sync();
return Optional.of(new Pair<>(nKeys.get(0), nLeafNode.page.getPageNum()));
}
sync();
return Optional.empty();
}