2023-12-18 C语言实现一个最简陋的B-Tree

不停感叹的老林_<C 语言编程核心突破>

已于 2023-12-21 12:56:36 修改

阅读量922

点赞数 18

分类专栏：笔记文章标签： c语言 B树

于 2023-12-18 19:55:57 首次发布

本文链接：https://blog.csdn.net/m0_54206076/article/details/135050727

版权

笔记专栏收录该内容

118 篇文章 4 订阅

订阅专栏

点击 <C 语言编程核心突破> 快速C语言入门

前言

要解决问题:

实现一个最简陋的B-Tree, 研究B-Tree的性质.

对于B树, 我是心向往之, 因为他是数据库的基石, 描述语言好像很容易理解, 但不造个轮子就不能彻底弄明白, 于是, 造个轮子.

想到的思路:

根据AI给的代码架子进行修改, 现在AI是个好东西, 虽说给的代码不一定靠谱, 但是debug一下, 还能深入了解, 总之是很有用.

其它的补充:

有一份C++ 的B-Tree, 是通过算法4的java代码移植的, 但是C++ 的内存管理教育了我, 太难整了, 于是一气之下, 全改为智能指针, 头疼的事就解决了. 也是很简陋的代码, 只有增查, 没有删改, 就暂时不提供了.

一、C语言B-Tree

基本架构:

为了适应不同的B-Tree节点, 通过宏BTREE_ORDER_SIZE 规定子节点的数量, 使用typedef int keyOfBTree;定义节点的key类型, 以适应不同需求.

BTreeNode的结构中, 对于值和子节点存储, 直接使用数组, 而不是指针, 好处是初始化的时候比较容易, free的时候也不容易出错, 毕竟都是数组, delete BTreeNode直接就完事了, 不用一个个的删除值, 省时间.

不好之处, 可能是自由度和空间利用度受限, 毕竟到最后叶子节点, 不管用不用子节点, 都要开辟子节点数组内存, 有一点点浪费.

打印节点内容以及释放树, 是用的递归, 毕竟这个用递归太容易了.

代码中最复杂的是分裂节点和向树中插入值, 需要慢慢体会, 多琢磨也不是太难.

至于删除节点, 更复杂, 需要耐心.

更改节点, 这里没有实现, 可以自行解决, 就是删除和插入.

`BTree.h`头文件.

#ifndef BTREE_
#define BTREE_

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

对于B树, 如果形象的比喻, 就是拍平的二叉树, 并且是平衡二叉树, 每个节点可以容纳N个key, 同时容纳N+1个子节点, 这是一条非常重要的性质, 同时, 节点存放的key是按顺序排列的, 子节点也是按照顺序排列的, 是完全有序的.

B树的阶数

一般子节点数量是偶数.

// B树的阶数，决定每个节点的孩子数量
#define BTREE_ORDER_SIZE 6

比较函数

为了泛型, 我们只能用比较函数指针进行比较, 毕竟C语言不可能重载操作符.

// 比较函数指针类型
typedef int (*cmpFuncPtr)(void *, void *);

`keyOfBTree`

修改keyOfBTree可以让BTree使用不同的key

// 定义B树的key类型, 利于泛型
typedef int keyOfBTree;

// 打印函数指针类型
typedef void (*printFun)(keyOfBTree);

B树的节点

B树的节点构成决定了其性质, B树含有一个key的数组, 以及子节点指针数组, 同时因为不一定数组全部是满的, 必须有一个num值指示究竟含有多少个key, 以及有多少个子节点, 也就是num + 1.

// B树的节点结构
typedef struct BTreeNode
{
    keyOfBTree keys[BTREE_ORDER_SIZE - 1];      // 关键字数组
    struct BTreeNode *childs[BTREE_ORDER_SIZE]; // 孩子节点指针数组
    uint32_t num; // 当前节点中的关键字数量
    int is_leaf;  // 是否为叶子节点
} BTreeNode;

typedef BTreeNode *BTree;

接口

B树有一些必须接口, 也是不能再精简的接口包括节点创建, 查找索引, 在节点中插入值, 分裂节点, 在B树中插入值, 以及B树的释放. 打印B树是为了展示B树的结构, 在现实中, 一般是没有的.

// 创建节点
BTreeNode *createNode(int is_leaf);

// 在B树中插入关键字
void insert(BTreeNode **root, keyOfBTree key, cmpFuncPtr cmp);

// 打印B树的关键字
void printBTree(BTreeNode *node, printFun printKey, int left, int *cnt);

// 释放BTree
void freeBTree(BTreeNode **node);

// 查找key在BTree中的位置
keyOfBTree *search(BTreeNode *root, keyOfBTree key, cmpFuncPtr cmp);

// 在B树中删除关键字key
void deleteKey(struct BTreeNode **root, keyOfBTree key, cmpFuncPtr cmp);

#endif

`BTree.c`实现.

#include "BTree.h"

创建节点

创建节点很简单, 要给一个参数, 识别是不是叶子节点, 叶子节点不含任何子节点, 只含有值,

非叶子节点, 既有值又有子节点.

通过malloc分配内存, 初始化置零, 赋值是否为叶子节点.

// 创建节点
BTreeNode *createNode(int is_leaf)
{
    BTreeNode *node = (BTreeNode *)malloc(sizeof(BTreeNode));
    memset(node, 0, sizeof(BTreeNode));
    node->is_leaf = is_leaf;
    return node;
}

查找索引位置

查找索引位置是B树的基本函数, 通过比较key和节点内部key数组中的值确定索引位置.

比如值是5, 节点内值数组是{1,3,8}, 用5和它们比较, 索引从0开始, 如果5大于1, 索引增加1, 大于3, 又增加1, 所以最终的索引值是2,

这个索引值非常重要, 通过它, 才能找到正确的子节点, 一步一步的深入找到最终的子节点.

// 查找关键字在节点中的索引位置
static int searchKeyIndex(BTreeNode *node, keyOfBTree key, cmpFuncPtr cmp)
{
    int index = 0;
    while (index < node->num && cmp(&key, &(node->keys[index])) > 0)
    {
        index++;
    }
    return index;
}

插入关键字到节点

这个插入函数是在确定了究竟要在哪个子节点插入值后使用的, 过程需要挪动数组中的元素.

// 插入关键字到节点中的指定位置
static void insertKey(BTreeNode *node, keyOfBTree key, cmpFuncPtr cmp)
{
    int index = (int)node->num - 1;
    while (index >= 0 && cmp(&key, &(node->keys[index])) < 0)
    {
        node->keys[index + 1] = node->keys[index];
        index--;
    }
    node->keys[index + 1] = key;

    node->num++;
}

分裂节点

分裂节点比较复杂, 为了理解, 需要阐述一下

分裂的是父节点的子节点, 所以传入的是父节点指针以及子节点索引.
过程中会创建一个与子节点同样性质, 也就是是否是叶子节点的节点.
如果要分裂的子节点是叶子节点, 就不会分裂子节点的子节点, 因为没有, 否则值数组和子节点指针数组要同时分裂.
分裂会把子节点的中间值提升给父节点, 比如满值是{1,2,3,4,5}, 那么就分裂为{1,2}{4,5}两个节点, 3提升给父节点接收.
被分裂的子节点的值数量num以及父节点的num都要被修改.

// 分裂一个满节点，将中间的关键字提升为父节点，并创建两个新的子节点
static void splitNode(BTreeNode *parent, int child_index)
{
    BTreeNode *child = parent->childs[child_index];

    BTreeNode *new_node = createNode(child->is_leaf);
    new_node->num = BTREE_ORDER_SIZE / 2 - 1;

    for (int i = 0; i < new_node->num; i++)
    {
        new_node->keys[i] = child->keys[BTREE_ORDER_SIZE / 2 + i];
    }

    if (!child->is_leaf)
    {
        for (int i = 0; i < BTREE_ORDER_SIZE / 2; i++)
        {
            new_node->childs[i] = child->childs[BTREE_ORDER_SIZE / 2 + i];
        }
    }

    child->num = BTREE_ORDER_SIZE / 2 - 1;

    for (int i = (int)parent->num; i > child_index; i--)
    {
        parent->childs[i + 1] = parent->childs[i];
    }

    parent->childs[child_index + 1] = new_node;

    for (int i = (int)parent->num - 1; i >= child_index; i--)
    {
        parent->keys[i + 1] = parent->keys[i];
    }

    parent->keys[child_index] = child->keys[BTREE_ORDER_SIZE / 2 - 1];
    parent->num++;
}

向B树插入值

向B树插入值, 过程也比较复杂, 需要阐述:

由于可能分裂根节点, 所以传入的是根节点的二级指针, 保证不丢失节点.
分三种情况, 根节点为空, 这个最简单, 直接生成节点, 在此节点插入值, 令根节点指向它.
根节点已满, 必须分裂根节点, 而为了分裂根节点, 需要给根节点整个父节点, 然后再将root指针指向这个父节点, 并进行分裂.
根节点非空非满, 如果根节点是叶子节点, 直接插入, 如果不是叶子节点, 那就要取得索引, 看索引地址的子节点是否是满的, 是则分裂, 然后进入子节点循环插入, 不是满的, 则直接进入子节点循环.
大家可能看出来了, 最终都是插入到叶子节点.

// 在B树中插入关键字
void insert(BTreeNode **root, keyOfBTree key, cmpFuncPtr cmp)
{
    BTreeNode *node = *root;

    // 如果根节点为空，则创建新的根节点
    if (node == NULL)
    {
        *root = createNode(1);
        insertKey(*root, key, cmp);
        return;
    }

    // 如果根节点已满，则需要创建一个新的根节点
    if (node->num == BTREE_ORDER_SIZE - 1)
    {
        BTreeNode *new_root = createNode(0);
        new_root->childs[0] = node;
        *root = new_root;
        splitNode(new_root, 0);
        insert(root, key, cmp); // 递归插入
        return;
    }

    // 如果根节点既非空也未满，直接插入
    while (1)
    {
        if (node->is_leaf)
        {
            insertKey(node, key, cmp);
            break;
        }
        
		int index = searchKeyIndex(node, key, cmp);
        if (node->childs[index]->num == BTREE_ORDER_SIZE - 1)
        {
            splitNode(node, index);
            if (cmp(&key, &(node->keys[index])) > 0)
            {
                index++;
            }
        }
        node = node->childs[index];
    }
}

打印B树

打印B树, 可视化, 有利于理解B树的插入规律.

// 打印B树的关键字
void printBTree(BTreeNode *node, printFun printKey, int left, int *cnt)
{
    if (node)
    {
        printf("%c%.2d([", "ABCDEFG"[left++], ++*cnt);
        for (int i = 0; i < node->num; i++)
        {
            printKey(node->keys[i]);
        }
        printf("]);\n");

        if (!node->is_leaf)
        {
            int leftL = left - 1;
            int cntL = *cnt;
            for (int i = 0; i <= node->num; i++)
            {
                printf("%c%.2d==>", "ABCDEFG"[leftL], cntL);
                printBTree(node->childs[i], printKey, left, cnt);
            }
            printf("\n");
        }
    }
}

释放B树

传入节点的二级指针, 最终确保随后节点指针指向NULL, 使用递归, 因为节点内部都是数组和整型值, 没有需要特殊处理的元素, 递归删除整个节点指针即可.

// 释放BTree
void freeBTree(BTreeNode **node)
{
    if (*node)
    {
        // 非叶子节点必有子节点, 递归删除子节点
        if (!(*node)->is_leaf)
        {
            // 子节点的数量不会大于key数量加1, 所以不用free child数组中所有节点;
            for (int i = 0; i <= (*node)->num; i++)
            {
                freeBTree(&((*node)->childs[i]));
            }
        }

        free(*node);
        *node = NULL;
    }
}

查找`key`在`BTree`中的位置

对于一个set, 查找key的位置可能并不重要, 但是可以变通一下, 如果keyOfBTree是一个struct, 内部有一个key和一个value, cmp负责比较key, 那么我们则可以变相的将这个BTreeSet变成BTreeMap.

// 查找key在BTree中的位置
keyOfBTree *search(BTreeNode *root, keyOfBTree key, cmpFuncPtr cmp)
{
    // 如果root为空, 返回NULL
    if (!root)
    {
        return NULL;
    }

    // 查找key在节点中的索引
    int index = searchKeyIndex(root, key, cmp);

    // 如果节点索引小于节点中key数量, 且key等于node在索引处的key值
    if (index < root->num && cmp(&key, &(root->keys[index])) == 0)
    {
        // 返回key在node中的指针
        return &(root->keys[index]);
    }

    // 如果节点不是叶子节点, 递归搜索索引为index的子节点
    if (!root->is_leaf)
    {
        return search(root->childs[index], key, cmp);
    }

    // 以上全没找到, 返回空指针
    return NULL;
}

删除节点

自己体会, 比较困难, 需要克服一下, 断点调试, 多次理解.

合并节点

// 合并节点
static void mergeNodes(struct BTreeNode *left, struct BTreeNode *root,
                       struct BTreeNode *right, int index)
{
    // 将根节点中的关键字移动到左子节点中
    left->keys[left->num] = root->keys[index];
    for (int i = 0; i < right->num; i++)
    {
        left->keys[left->num + 1 + i] = right->keys[i];
    }

    // 将根节点中的子节点移动到左子节点中
    if (!left->is_leaf)
    {
        for (int i = 0; i < right->num + 1; i++)
        {
            left->childs[left->num + 1 + i] = right->childs[i];
        }
    }

    // 更新左子节点的关键字个数
    left->num += right->num + 1;

    // 从根节点中删除关键字和子节点
    for (int i = index; i < root->num - 1; i++)
    {
        root->keys[i] = root->keys[i + 1];
    }
    for (int i = index + 1; i < root->num; i++)
    {
        root->childs[i] = root->childs[i + 1];
    }
    root->num--;

    // 释放右子节点的内存
    free(right);
}

从左右兄弟借关键字

// 从左右兄弟借关键字
static int borrowNode(struct BTreeNode *node, int index)
{
    // 从左兄弟节点借关键字
    if (index != 0 && node->childs[index - 1]->num >= (BTREE_ORDER_SIZE / 2))
    {
        struct BTreeNode *leftChild = node->childs[index - 1];
        struct BTreeNode *child = node->childs[index];

        // 关键字右移
        for (int i = (int)child->num - 1; i >= 0; i--)
        {
            child->keys[i + 1] = child->keys[i];
        }
        child->keys[0] = node->keys[index - 1];
        node->keys[index - 1] = leftChild->keys[leftChild->num - 1];

        //  子节点右移
        if (!child->is_leaf)
        {
            for (int i = (int)child->num; i >= 0; i--)
            {
                child->childs[i + 1] = child->childs[i];
            }
            child->childs[0] = leftChild->childs[leftChild->num];
        }
        // 更新节点的关键字数量
        child->num++;
        // 在左兄弟节点中删除关键字
        leftChild->num--;
        return 1;
    }

    // 从右兄弟节点借关键字
    if (index != node->num &&
        node->childs[index + 1]->num >= (BTREE_ORDER_SIZE / 2))
    {
        struct BTreeNode *rightChild = node->childs[index + 1];
        struct BTreeNode *child = node->childs[index];

        // 关键字左移
        child->keys[child->num] = node->keys[index];
        node->keys[index] = rightChild->keys[0];

        for (int i = 0; i < rightChild->num - 1; i++)
        {
            rightChild->keys[i] = rightChild->keys[i + 1];
        }

        // 子节点左移
        if (!child->is_leaf)
        {
            child->childs[child->num + 1] = rightChild->childs[0];
            for (int i = 0; i < rightChild->num; i++)
            {
                rightChild->childs[i] = rightChild->childs[i + 1];
            }
        }
        // 更新节点的关键字数量
        child->num++;
        // 在右兄弟节点中删除关键字
        rightChild->num--;
        return 1;
    }
    return 0;
}

填充节点

// 填充节点
static void fillNode(struct BTreeNode *node, int index)
{
    // 从左右兄弟借关键字
    if (borrowNode(node, index))
    {
        return;
    }

    // 合并子节点
    if (index != node->num)
    {
        mergeNodes(node->childs[index], node, node->childs[index + 1], index);
    }
    else
    {
        mergeNodes(node->childs[index - 1], node, node->childs[index],
                   index - 1);
    }
}

检查root是否置空但子节点不空

// 检查root是否置空但子节点不空
static void testRootIsZero(BTreeNode **root)
{
    if ((*root)->num == 0 && (*root)->childs[0])
    {
        BTreeNode *temp = *root;
        (*root) = (*root)->childs[0];
        free(temp);
    }
    if ((*root)->num == 0)
    {
        free(*root);
        *root = NULL;
    }
}

在当前节点中删除关键字


// 如果关键字在当前节点中，删除关键字
static int deleteKeyInNode(struct BTreeNode **root, keyOfBTree key,
                           cmpFuncPtr cmp, int index)
{
    // 如果关键字在当前节点中，删除关键字
    if (index < (*root)->num && cmp(&key, &((*root)->keys[index])) == 0)
    {
        if ((*root)->is_leaf)
        {
            // 如果是叶子节点，直接删除关键字
            for (int i = index; i < (*root)->num - 1; i++)
            {
                (*root)->keys[i] = (*root)->keys[i + 1];
            }
            (*root)->num--;
        }
        else
        {
            // 如果不是叶子节点，找到关键字的前驱或后继节点
            struct BTreeNode *predecessor = (*root)->childs[index];
            struct BTreeNode *successor = (*root)->childs[index + 1];

            // 如果前驱节点包含至少 (BTREE_ORDER_SIZE/2) 个关键字
            if (predecessor->num >= (BTREE_ORDER_SIZE / 2))
            {
                // 找到前驱节点中的最后一个关键字，替换为当前关键字
                while (!predecessor->is_leaf)
                {
                    predecessor = predecessor->childs[predecessor->num];
                }
                int lastKey = predecessor->keys[predecessor->num - 1];
                (*root)->keys[index] = lastKey;
                deleteKey(&(*root)->childs[index], lastKey, cmp);
            }
            // 如果后继节点包含至少 (BTREE_ORDER_SIZE/2) 个关键字
            else if (successor->num >= (BTREE_ORDER_SIZE / 2))
            {
                // 找到后继节点中的第一个关键字，替换为当前关键字
                while (!successor->is_leaf)
                {
                    successor = successor->childs[0];
                }
                int firstKey = successor->keys[0];
                (*root)->keys[index] = firstKey;
                deleteKey(&(*root)->childs[index + 1], firstKey, cmp);
            }
            // 如果前驱和后继节点都只包含 (BTREE_ORDER_SIZE/2) -1 个关键字
            else
            {
                // 合并前驱、关键字和后继，删除关键字
                mergeNodes(predecessor, (*root), successor, index);
                deleteKey(&predecessor, key, cmp);
            }
        }
        return 1;
    }

    return 0;
}

子节点递归删除

// 如果关键字不在当前节点中，向适当的子节点递归删除
static void deleteKeyInChildNode(struct BTreeNode **root, keyOfBTree key,
                                 cmpFuncPtr cmp, int index)
{
    if ((*root)->is_leaf)
    {
        return;
    }
    int flag = (index == (*root)->num); // 是否在最后一个子节点的范围内
    struct BTreeNode *childs = (*root)->childs[index];
    if (childs->num < (BTREE_ORDER_SIZE / 2))
    {
        fillNode((*root), index);
    }
    if (flag && index > (*root)->num)
    {
        deleteKey(&(*root)->childs[index - 1], key, cmp);
    }
    else
    {
        deleteKey(&(*root)->childs[index], key, cmp);
    }
}

在B树中删除关键字

// 在B树中删除关键字key
void deleteKey(struct BTreeNode **root, keyOfBTree key, cmpFuncPtr cmp)
{
    if ((*root) == NULL)
    {
        return;
    }

    // 找到要删除的关键字在节点中的位置
    int index = searchKeyIndex((*root), key, cmp);

    if (!deleteKeyInNode(root, key, cmp, index))
    {
        deleteKeyInChildNode(root, key, cmp, index);
    }

    testRootIsZero(root);
}

测试用例

向B树插入32个区间在0-999的整数值, 打印成mermaid文本, 可在markdown软件下图形化.

#include "BTree.h"
#include <stdlib.h>

#define SIZE 32

void printKey(keyOfBTree key)
{
    printf("%d\t", key);
}

int cmpInt(const int *lhs, const int *rhs)
{
    return *lhs - *rhs;
}

int main()
{
    int arr[SIZE];
    for (int i = 0; i != SIZE; ++i)
    {
        arr[i] = rand() % 1000;
    }

    // 创建一个空的B树
    BTree root = NULL;

    // 依次插入关键字
    for (int j = 0; j != SIZE; ++j)
    {
        insert(&root, arr[j], (cmpFuncPtr)cmpInt);

        printf("```mermaid\ngraph TD;\nsubgraph "
               "Insert\nInsertNum((%d));\nend\nsubgraph BTree\n",
               arr[j]);
        int cnt = 0;

        // 打印B树
        printBTree(root, printKey, 0, &cnt);

        printf("end\n```\n\n");
    }

    int *rest = search(root, 902, (cmpFuncPtr)cmpInt);

    if (rest)
    {
        printf("%d\n", *rest);
    }

    // 依次删除关键字
    for (int j = 0; j != SIZE; ++j)
    {
        deleteKey(&root, arr[j], (cmpFuncPtr)cmpInt);

        printf("```mermaid\ngraph TD;\nsubgraph "
               "Delete\ndeleteNum((%d));\nend\nsubgraph BTree\n",
               arr[j]);
        int cnt = 0;

        // 打印B树
        printBTree(root, printKey, 0, &cnt);

        printf("end\n```\n\n");
    }

    // 释放内存
    freeBTree(&root);

    return 0;
}

#include "BTree.c"