h2database源码解析-如何将undo log插入到B树

最新推荐文章于 2023-02-11 13:38:31 发布

龚厂长

最新推荐文章于 2023-02-11 13:38:31 发布

阅读量758

点赞数

分类专栏： h2database 文章标签： b树 java 数据结构数据库

本文链接：https://blog.csdn.net/weixin_38308374/article/details/128877594

版权

h2database 专栏收录该内容

6 篇文章 2 订阅

订阅专栏

在文章《h2database源码解析-如何插入一条行记录》中介绍了一条数据如何插入到B树，里面对undo log的构建进行了介绍，本文将介绍undo log构建好之后如何插入到B+树。

1、准备工作

h2构建好undo log对象后，会调用MVMap.append()方法保存undo log：

    public void append(K key, V value) {
    	//singleWriter属性在以后文章中介绍，默认值是true，本文只关注true的情况
        if (singleWriter) {
        	//beforeWrite()里面做一些校验，比如检查数据库是否停止，
        	//然后检查是否需要将内存的页面刷新到磁盘，如果需要，则将页面刷新到磁盘
            beforeWrite();
            //对根页面加锁
            RootReference<K,V> rootReference = lockRoot(getRoot(), 1);
            //获取RootReference有多少个记录未保存到B树页面上
            int appendCounter = rootReference.getAppendCounter();
            try {
            	//keysPerPage表示一个页面上最多存储多少个数据，默认是48个
                if (appendCounter >= keysPerPage) {
                	//如果超过了，则先将未保存到B树页面上的数据保存到B树上
                    rootReference = flushAppendBuffer(rootReference, false);
                    appendCounter = rootReference.getAppendCounter();
                    assert appendCounter < keysPerPage;
                }
                //将本次要插入的undo log放入到keysBuffer和valuesBuffer数组中
                keysBuffer[appendCounter] = key;
                if (valuesBuffer != null) {
                    valuesBuffer[appendCounter] = value;
                }
                ++appendCounter;
            } finally {
            	//将根页面解锁，解锁的时候会将appendCounter保存到RootReference
                unlockRoot(appendCounter);
            }
        } else {
            put(key, value);
        }
    }

MVMap.append()方法的入参在《h2database源码解析-如何插入一条行记录》已经介绍。
从MVMap.append()方法可以看出，每次保存undo log日志其实都是保存到MVMap的keysBuffer和valuesBuffer数组中，这两个数组只是用于暂时存储undo log日志，待条件满足时，将数组中日志写到B+树，而这两个数组中已经存储的日志个数appendCounter则保存在RootReference对象中，而且加锁也是依靠RootReference，下面来看看RootReference。

RootReference

RootReference从类名字上可以看出来是根页面的引用。下面是RootReference的属性介绍：

public final class RootReference<K,V> {
    /**
     * 根页面
     */
    public final Page<K,V> root;
    /**
     * 版本号，与写操作有关
     */
    public final long version;
    /**
     * 可重入锁的个数
     */
    private final byte holdCount;
    /**
     * 加锁线程id
     */
    private final long ownerId;
    /**
     * 表示前一个根页面
     */
    volatile RootReference<K,V> previous;
    /**
     * 根页面加锁成功的次数
     */
    final long updateCounter;
    /**
     * 根页面尝试加锁失败的次数
     */
    final long updateAttemptCounter;
	//追加缓冲区占用部分的大小
    private final byte appendCounter;
}

RootReference类不是很复杂，主要与加锁有关。
h2创建了RootReference对象后，会将该对象设置到MVMap.root属性，下面代码是该属性的定义：

    /**
     * Reference to the current root page.   
     */
    private final AtomicReference<RootReference<K,V>> root;

MVMap.root属性是一个AtomicReference类型的，这样可以借助java的CAS原子的更新MVMap的根页面。h2也是借助于CAS实现了对根页面的加锁。

一个MVMap对象代表了一颗B+树，所以在MVMap对象里面有B+树根页面的引用，这颗B+树可以是一个表，一个索引，也可以用于记录undo log。
对根页面加锁其实就是对整棵树加锁，也就是锁表。

加锁

接下来拆开如下代码详细看看h2如何实现对根页面的加锁：

      lockRoot(getRoot(), 1)

getRoot()方法很简单，就是从MVMap.root属性里面获取RootReference对象：

    public RootReference<K,V> getRoot() {
        return root.get();
    }

接下来看看lockRoot()方法：

    private RootReference<K,V> lockRoot(RootReference<K,V> rootReference, int attempt) {
    	//不断尝试加锁，直到加锁成功为止
        while(true) {
        	//尝试加锁，如果加锁成功会返回一个不为null的对象，否则返回null
            RootReference<K,V> lockedRootReference = tryLock(rootReference, attempt++);
            if (lockedRootReference != null) {
                return lockedRootReference;
            }
            //有可能其他线程在加锁中间会更新根页面，所以每次加锁失败重新获取新的RootReference对象
            rootReference = getRoot();
        }
    }
    //尝试对根页面加锁，代码有删减
    protected RootReference<K,V> tryLock(RootReference<K,V> rootReference, int attempt) {
        RootReference<K,V> lockedRootReference = rootReference.tryLock(attempt);
        if (lockedRootReference != null) {
        	//加锁成功，则将最新的RootReference对象返回
            return lockedRootReference;
        }
        
        //下面代码做了删减，删减的代码主要作用是当满足一定的条件后，将线程休眠或者使线程等待，
        //防止线程不断的尝试加锁，给系统造成负担
        return null;
    }

加锁最终调用了rootReference.tryLock()方法：

    RootReference<K,V> tryLock(int attemptCounter) {
    	//首先判断是否可以加锁，如果可以加锁则尝试加锁，如果不可以加锁则直接返回null
        return canUpdate() ? tryUpdate(new RootReference<>(this, attemptCounter)) : null;
    }
    //canUpdate()首先判断当前是否有其他线程已经对本RootReference对象加锁了，如果有，
    //则再检查已经加锁的线程id与当前线程id是否一致，如果一致则表示是当前线程加的锁
    private boolean canUpdate() {
        return isFree() || ownerId == Thread.currentThread().getId();
    }
    private boolean isFree() {
        return holdCount == 0;
    }
    private RootReference<K,V> tryUpdate(RootReference<K,V> updatedRootReference) {
        //调用MVMap.compareAndSetRoot()方法
        return root.map.compareAndSetRoot(this, updatedRootReference) ? updatedRootReference : null;
    }
    //MVMap.compareAndSetRoot()方法如下：
    final boolean compareAndSetRoot(RootReference<K,V> expectedRootReference,
                                    RootReference<K,V> updatedRootReference) {
        return root.compareAndSet(expectedRootReference, updatedRootReference);
    }

在MVMap.compareAndSetRoot()方法中可以看到，使用java的CAS尝试更新root属性的RootReference对象，如果更新成功，则表示当前线程加了锁。
介绍完了加锁，下面看下如何解锁。

解锁

解锁也是使用的CAS，不过在解锁的过程中，对RootReference中的一些属性进行了更新。

	//appendCounter表示有多少个undo log存储在keysBuffer和valuesBuffer里面
    private void unlockRoot(int appendCounter) {
        unlockRoot(null, appendCounter);
    }
    private RootReference<K,V> unlockRoot(Page<K,V> newRootPage, int appendCounter) {
        RootReference<K,V> updatedRootReference;
        do {
            RootReference<K,V> rootReference = getRoot();
            updatedRootReference = rootReference.updatePageAndLockedStatus(
                                        newRootPage == null ? rootReference.root : newRootPage,
                                        false,
                                        appendCounter == -1 ? rootReference.getAppendCounter() : appendCounter
            );
        } while(updatedRootReference == null);
        notifyWaiters();
        return updatedRootReference;
    }

下面代码是RootReference.updatePageAndLockedStatus()方法：

    RootReference<K,V> updatePageAndLockedStatus(Page<K,V> page, boolean keepLocked, int appendCounter) {
    	//属性keepLocked用于更新RootReference.holdCount属性，如果keepLocked为false，表示解锁，holdCount要减一
        return canUpdate() ? tryUpdate(new RootReference<>(this, page, keepLocked, appendCounter)) : null;
    }

加锁原理理解了，解锁的代码也就非常好理解了，与加锁一样，重新创建一个RootReference对象，然后借助于CAS将新的RootReference对象更新到root属性中。

2、Page

本文后面会使用到Page和CursorPos类，本小节首先看下Page，下一小节看下CursorPos。一个Page对象表示了B+树的一个节点，也就是磁盘上的一个页面，下面代码只展示与本文有关的属性，其他属性做了删除：

public abstract class Page<K,V> implements Cloneable {

    /**
     * 该页面所属的MVMap对象，可以理解为该页面所属的B+树
     */
    public final MVMap<K,V> map;

    /**
     * 用于保存该页面的key
     */
    private K[] keys;
}

Page是一个抽象类，它有两个子类，分别是：
Leaf类，表示B+树上的叶子节点：

    private static class Leaf<K,V> extends Page<K,V> {
        private V[] values;//存储key对应的value值
	}

NonLeaf类，非叶子节点：

    private static class NonLeaf<K,V> extends Page<K,V> {
        /**
         * 该节点的子节点
         */
        private PageReference<K,V>[] children;

        /**
        * 用于记录所有子节点中记录的总个数
        */
        private long totalCount;
	}

3、CursorPos

CursorPos可以理解为当前遍历到B+树的位置，可以通过CursorPos.parent属性将从根节点到叶子节点的路径以链条的形式串联起来。

public final class CursorPos<K,V> {

    /**
     * 当前正在遍历的页面
     */
    public Page<K,V> page;

    /**
     * 如果是叶子节点，表示当前正在遍历的Page对象的keys数组下标，index为负数，表示可以插入到keys数组的位置；
     */
    public int index;

    /**
     * 指向当前正在遍历节点的父节点，父节点CursorPos对象中index属性值表示子节点在Page.keys数组中的下标
     */
    public CursorPos<K,V> parent;
}

通过CursorPos中的parent及index属性可以实现对整颗树的遍历，可以表示出从根节点到叶子节点的访问路径。
CursorPos的作用如下图：
请添加图片描述
如上图所示，当这棵树有三层结构时，遍历的时候，会有三个CursorPos对象，通过CursorPos的三个属性，可以实现对整颗树的遍历。

4、更新undo log B+树

undo log写入到了临时数组，也就是keysBuffer和valuesBuffer数组，当出现以下情况的时候就要将临时数组的数据写入到B+树中：

事务回滚；
keysBuffer和valuesBuffer数组超过了一页日志个数的上限；
事务提交；
部分场景下遍历索引也会写入B+数；

还有一些情况也会写入B+树，这里不一一列举了。
介绍下面代码前，先说明一点，undo log是按照追加模式写入B+树的，同一颗B+树中后生成的undo log key一定比先生成的大，因此从数组写入B+树时，按照数组下标顺序写入B+树即可，每次都是写入最新的叶子节点里面。
写入B+树是调用MVMap.flushAppendBuffer()方法完成的，该方法代码比较多，我们先介绍一下整体步骤，之后在看代码。该方法一共做了如下几步：

判断是否向B+树中插入数据，判断规则是待插入数据的个数是否超过了阈值；
如果需要插入数据，那么则对根节点加锁；
向B+树的最新页面里面插入undo log日志，如果插入的数据超过了页面的上限，那么则新建页面，将数据放入新页面；
如果没有新建页面，那么更新MVMap.root属性并解锁之后退出该方法；
如果新建了页面，那么需要将该页面添加到B+树中，并且遍历其祖先节点，判断是否需要分裂，如果需要分裂，则对节点进行分裂，新分裂出来的节点里面只有一个key，所有的祖先节点分裂完毕之后，再更新MVMap.root属性并解锁之后退出该方法。

下面展示了该方法的代码，该方法代码比较多，但是完成的事情很简单，大家可以参照着上面的步骤阅读一下代码：

	//rootReference表示undo log的根页面
	//fullFlush有两个作用，
	//一是判断本次是否需要写入B+树，如果为true时，则只要keysBuffer和valuesBuffer数组有数据，则写入B+树，否则当数组中数据个数达到一页要求的时候，也就是key有48个的时候，才写入B+树；
	//第二个作用是如果需要写入B+树的key个数过多，超过了一页，那么剩下key是继续保留在keysBuffer和valuesBuffer数组中，还是新建一个页面，将这些数据写入该新建页面中，true表示写入新建页面，否则继续保留在数组中，比如本次需要写入5个key，B+树的叶子节点页面现在只能再写入三个key了，那么剩下的两个key就需要根据fullFlush的值判断是如何处理了。
	//大部分情况下，fullFlush是true，当执行事务的时候需要增加一个undo log而且此时RootReference.appendCounter大于等于48了，那么此时设置fullFlush为false调用下面的方法
    private RootReference<K,V> flushAppendBuffer(RootReference<K,V> rootReference, boolean fullFlush) {
        boolean preLocked = rootReference.isLockedByCurrentThread();
        boolean locked = preLocked;
        int keysPerPage = store.getKeysPerPage();//每页key个数上限，默认是48
        try {
            IntValueHolder unsavedMemoryHolder = new IntValueHolder();
            int attempt = 0;
            int keyCount;
            //下面fullFlush起的就是第一个作用
            int availabilityThreshold = fullFlush ? 0 : keysPerPage - 1;
            while ((keyCount = rootReference.getAppendCounter()) > availabilityThreshold) {
                if (!locked) {
                	//对根页面加锁
                    rootReference = tryLock(rootReference, ++attempt);
                    if (rootReference == null) {
                        rootReference = getRoot();
                        continue;
                    }
                    locked = true;
                }
				//获取B+树的根页面
                Page<K,V> rootPage = rootReference.root;
                long version = rootReference.version;
                //找到B+树最新的叶子节点，也就是B+树最右下的叶子节点
                CursorPos<K,V> pos = rootPage.getAppendCursorPos(null);
                //index表示接下来undo log要插入Page的keys和values数组的下标，
                //该值也就是keys数组中最小的且没有存储数据的下标
                int index = -pos.index - 1;
                Page<K,V> p = pos.page;
                CursorPos<K,V> tip = pos;
                //将pos指向父CursorPos对象
                pos = pos.parent;

                int remainingBuffer = 0;
                Page<K,V> page = null;
                //available表示当前页面还可以插入多少个key
                int available = keysPerPage - p.getKeyCount();
                //当available小于等于0时，表示该页面无法插入新的数据了，那么需要新建一个子节点；
                //否则，比较需要插入的数据个数与页面剩余的空间，如果插入的数据大于页面剩余空间，那么根据fullFlush的值做不同的处理，否则直接插入当前页面即可
                if (available > 0) {
                    p = p.copy();
                    if (keyCount <= available) {
                    	//如果页面空间充足，则将数据直接写入Page对象的keys和values数组
                        p.expand(keyCount, keysBuffer, valuesBuffer);
                    } else {
                    	//如果页面剩余空间不能完全写入所有数据，那么先写入一部分，剩余的数据根据fullFlush做不同处理
                        p.expand(available, keysBuffer, valuesBuffer);
                        keyCount -= available;
                        if (fullFlush) {
                        	//当fullFlush为true时，新建一个子页面，将剩余的数据写入子页面
                            K[] keys = p.createKeyStorage(keyCount);
                            V[] values = p.createValueStorage(keyCount);
                            System.arraycopy(keysBuffer, available, keys, 0, keyCount);
                            if (valuesBuffer != null) {
                                System.arraycopy(valuesBuffer, available, values, 0, keyCount);
                            }
                            page = Page.createLeaf(this, keys, values, 0);
                        } else {
                        	当fullFlush为false时，将剩余的数据更新到MVMap的keysBuffer和valuesBuffer数组中
                            System.arraycopy(keysBuffer, available, keysBuffer, 0, keyCount);
                            if (valuesBuffer != null) {
                                System.arraycopy(valuesBuffer, available, valuesBuffer, 0, keyCount);
                            }
                            remainingBuffer = keyCount;
                        }
                    }
                } else {
                    tip = tip.parent;
                    page = Page.createLeaf(this,
                            Arrays.copyOf(keysBuffer, keyCount),
                            valuesBuffer == null ? null : Arrays.copyOf(valuesBuffer, keyCount),
                            0);
                }

                unsavedMemoryHolder.value = 0;
                //如果上面代码需要新增子页面，那么page不为null，否则为null，
                //当新增了子页面，那么需要更新其各个祖先页面，如果key过多，还需要对页面分裂
                if (page != null) {
                    K key = page.getKey(0);
                    unsavedMemoryHolder.value += page.getMemory();
                    //下面的while循环从新增的子页面开始向上访问各个祖先页面，更新祖先页面的keys数组
                    while (true) {
                    	//pos指向当前正在访问节点的父节点，pos等于null时，表示当前遍历的节点为根节点
                    	//下面代码中的变量p表示当前正在访问的页面
                        if (pos == null) {
                            if (p.getKeyCount() == 0) {
                            	//p.getKeyCount()为0只有在页面上最多存储1个（叶子节点）或者2个（非叶子节点）key时才会出现，
                            	//当这种情况出现时，新增的页面就作为根页面
                                p = page;
                            } else {
                            	//原来的根页面也进行了分裂，那么新建一个根节点，并将分裂出来的两个节点作为其子节点
                                K[] keys = p.createKeyStorage(1);
                                keys[0] = key;
                                Page.PageReference<K,V>[] children = Page.createRefStorage(2);
                                children[0] = new Page.PageReference<>(p);
                                children[1] = new Page.PageReference<>(page);
                                unsavedMemoryHolder.value += p.getMemory();
                                p = Page.createNode(this, keys, children, p.getTotalCount() + page.getTotalCount(), 0);
                            }
                            break;
                        }
                        //如果当前正在遍历的节点不是根节点，则执行下面的逻辑
                        Page<K,V> c = p;
                        p = pos.page;
                        index = pos.index;
                        pos = pos.parent;
                        p = p.copy();
                        //变量page为新增页面，将新增的页面设置到父页面children[index]
                        p.setChild(index, page);
                        //将新增页面的最小key设置到父页面keys[index]
                        p.insertNode(index, key, c);
                        keyCount = p.getKeyCount();
                        int at = keyCount - (p.isLeaf() ? 1 : 2);
                        //如果页面的key个数不超过48，并且页面的占用空间不超过4K或者页面占用空间超过了4K但是页面中key的个数不超过1（叶子节点）或2（父节点），
                        //那么不对当前页面分裂，直接退出while循环，否则分裂当前页面
                        if (keyCount <= keysPerPage &&
                                (p.getMemory() < store.getMaxPageSize() || at <= 0)) {
                            break;
                        }
                        
                        key = p.getKey(at);
                        //对非叶子节点进行分裂，当前节点拆分出两个节点出来，
                        //第一个节点存储at个key，第二个节点存储keys.length-at-1个节点，这里为什么减一，原因是当前拆分的节点是非叶子节点
                        //第一个节点的数据还是存储在当前Page对象，第二个节点为新建Page对象，并且该新建对象作为split()方法返回值建的Page对象
                        //因为undo log是按照顺序存储的，新加的undo log一定存储在B+树的最后，因此这里拆分节点时，不是按照等分拆的，根据代码，拆分出来的新节点只有一个key
                        page = p.split(at);
                        unsavedMemoryHolder.value += p.getMemory() + page.getMemory();
                    }
                }
                //到这里表示数据已经新增完成，需要分裂的节点也已经分裂好，
                //replacePage()方法是将当前处理的页面p设置到其父节点中，为什么要有这一步，
                //原因是向页面中增加数据时，都是copy了一个原来的页面，在新copy出来的页面上插入数据，原来的旧页面没做任何修改，
                //所以如果不执行下面的逻辑，那么父节点上引用的还是旧页面
                //下面的代码还进行了另一个操作就是找到根页面并作为方法返回值返回
                p = replacePage(pos, p, unsavedMemoryHolder);
                //将根页面设置到MVMap的root属性中，这表示对数据的根节点进行了更新
                rootReference = rootReference.updatePageAndLockedStatus(p, preLocked || isPersistent(),
                        remainingBuffer);
                if (rootReference != null) {
                    locked = preLocked || isPersistent();
                    if (isPersistent() && tip != null) {
                        registerUnsavedMemory(unsavedMemoryHolder.value + tip.processRemovalInfo(version));
                    }
                    break;
                }
                //获得新的根页面，进行下次循环，检查是否还需要将数据插入B+树中
                rootReference = getRoot();
            }
        } finally {
            if (locked && !preLocked) {
                rootReference = unlockRoot();//解锁
            }
        }
        //返回新的根页面
        return rootReference;
    }