StarRocks源码阅读系列（3）compaction 压缩机制

lixiaoer666

已于 2022-10-31 20:35:14 修改

阅读量4.2k

点赞数 29

分类专栏： StarRocks 文章标签： 1024程序员节 database 大数据数据库

于 2022-10-24 19:17:47 首次发布

本文链接：https://blog.csdn.net/qq_35200943/article/details/127498751

版权

前言

本文是基于StarRocks 2.3.3版本源码阅读总结，不同版本源码可能有较大变化，仅供参考。
由于StarRocks的be是用c++语言写的，我的c++水平一般，所以自己对源码的解读可能有不正确的地方，欢迎大佬们指正。

Compaction机制

在开始阅读源码之前先简单介绍一下StarRocks的compaction机制
StarRocks为了保证数据写入的高性能，每一次有新的数据写入的时候，并不会直接写到旧的数据文件中，而是将这些新数据单独写到一个新文件中，称为一个single file。
如果数据目录中的single file过多，那么在查询的时候肯定性能会大幅降低，为此StarRocks有两种压缩机制来处理这些新写入的文件。

cumulative compaction累进压缩

这是一种解决小文件过多的轻量级压缩机制，它不会将single file数据目录中的base file压缩到一起，因为base file太大了，可能会造成IO徒增影响集群性能，而是将多个single file聚合到一起，聚合为一种cumulative file。
默认情况下，每5个single file生成都会触发一次cumulative compaction。
它受到以下配置影响：

参数名	默认值	备注
cumulative_compaction_check_interval_seconds	1	线程检测周期，默认1s
min_cumulative_compaction_num_singleton_deltas	5	触发cumulative compaction的最小singleton file数量
max_cumulative_compaction_num_singleton_deltas	1000	最大1次压缩多少个文件
cumulative_compaction_num_threads_per_disk	1	每个磁盘用来处理cumulative 线程的数量
cumulative_compaction_skip_window_seconds	30	跳过最新的single file时间，因为最新写入的数据可能会被马上查询，所以先不压缩

base compaction 基础压缩

在执行了若干次cumulative compaction后，细粒度的小文件问题得到了缓解，但是引入了新的小文件问题：cumulative file太多了，还是会影响查询。
因此在达到了某些条件后，系统开始执行base compaction 压缩，来将所有的cumulative合并为1个文件。
因为base compaction操作比较重，吃磁盘IO比较高，因此一般来说执行频率不是很高。
它受到以下配置影响：

参数名	默认值	备注
base_compaction_check_interval_seconds	60	线程的检测周期，60s
min_base_compaction_num_singleton_deltas	5	最小的single文件数量，指的是被cumulative后的sigle文件数量
max_base_compaction_num_singleton_deltas	100	单次BaseCompaction合并的最大segment数
base_compaction_num_threads_per_disk	1	每个磁盘 BaseCompaction 线程的数目
base_cumulative_delta_ratio	0.3	Cumulative文件大小达到Base文件大小的比例
base_compaction_interval_seconds_since_last_operation	86400	上一轮 BaseCompaction 距今的间隔，是触发 BaseCompaction 条件之一。

Base Compaction源码阅读

Status StorageEngine::start_bg_threads() {
   
    _update_cache_expire_thread = std::thread([this] {
    _update_cache_expire_thread_callback(nullptr); });
    Thread::set_thread_name(_update_cache_expire_thread, "cache_expire");
    LOG(INFO) << "update cache expire thread started";

    _unused_rowset_monitor_thread = std::thread([this] {
    _unused_rowset_monitor_thread_callback(nullptr); });
    Thread::set_thread_name(_unused_rowset_monitor_thread, "rowset_monitor");
    LOG(INFO) << "unused rowset monitor thread started";

    // start thread for monitoring the snapshot and trash folder
    _garbage_sweeper_thread = std::thread([this] {
    _garbage_sweeper_thread_callback(nullptr); });
    Thread::set_thread_name(_garbage_sweeper_thread, "garbage_sweeper");
    LOG(INFO) << "garbage sweeper thread started";

    // start thread for monitoring the tablet with io error
    _disk_stat_monitor_thread = std::thread([this] {
    _disk_stat_monitor_thread_callback(nullptr); });
    Thread::set_thread_name(_disk_stat_monitor_thread, "disk_monitor");
    LOG(INFO) << "disk stat monitor thread started";

    // convert store map to vector
    std::vector<DataDir*> data_dirs;
    for (auto& tmp_store : _store_map) {
   
        data_dirs.push_back(tmp_store.second);
    }
    int32_t data_dir_num = data_dirs.size();

    if (!config::enable_event_based_compaction_framework) {
   
        // base and cumulative compaction threads
        int32_t base_compaction_num_threads_per_disk =
                std::max<int32_t>(1, config::base_compaction_num_threads_per_disk);
        int32_t cumulative_compaction_num_threads_per_disk =
                std::max<int32_t>(1, config::cumulative_compaction_num_threads_per_disk);
        int32_t base_compaction_num_threads = base_compaction_num_threads_per_disk * data_dir_num;
        int32_t cumulative_compaction_num_threads = cumulative_compaction_num_threads_per_disk * data_dir_num;

        // calc the max concurrency of compaction tasks
        int32_t max_compaction_concurrency = config::max_compaction_concurrency;
        if (max_compaction_concurrency < 0 ||
            max_compaction_concurrency > base_compaction_num_threads + cumulative_compaction_num_threads) {
   
            max_compaction_concurrency = base_compaction_num_threads + cumulative_compaction_num_threads;
        }
        vectorized::Compaction::init(max_compaction_concurrency);

        _base_compaction_threads.reserve(base_compaction_num_threads);
        for (uint32_t i = 0; i < base_compaction_num_threads; ++i) {
   
            _base_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
   
                _base_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
            });
            Thread::set_thread_name(_base_compaction_threads.back(), "base_compact");
        }
        LOG(INFO) << "base compaction threads started. number: " << base_compaction_num_threads;

        _cumulative_compaction_threads.reserve(cumulative_compaction_num_threads);
        for (uint32_t i = 0; i < cumulative_compaction_num_threads; ++i) {
   
            _cumulative_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
   
                _cumulative_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
            });
            Thread::set_thread_name(_cumulative_compaction_threads.back(), "cumulat_compact");
        }
        LOG(INFO) << "cumulative compaction threads started. number: " << cumulative_compaction_num_threads;
    } else {
   
        // new compaction framework

        // compaction_manager must init_max_task_num() before any comapction_scheduler starts
        _compaction_manager->init_max_task_num();
        _compaction_scheduler = std::thread([] {
   
            CompactionScheduler compaction_scheduler;
            compaction_scheduler.schedule();
        });
        Thread::set_thread_name(_compaction_scheduler, "compact_sched");
        LOG(INFO) << "compaction scheduler started";

        _compaction_checker_thread = std::thread([this] {
    compaction_check(); });
        Thread::set_thread_name(_compaction_checker_thread, "compact_check");
        LOG(INFO) << "compaction checker started";
    }

    int32_t update_compaction_num_threads_per_disk =
            std::max<int32_t>(1, config::update_compaction_num_threads_per_disk);
    int32_t update_compaction_num_threads = update_compaction_num_threads_per_disk * data_dir_num;
    _update_compaction_threads.reserve(update_compaction_num_threads);
    for (uint32_t i = 0; i < update_compaction_num_threads; ++i) {
   
        _update_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
   
            _update_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
        });
        Thread::set_thread_name(_update_compaction_threads.back(), "update_compact");
    }
    LOG(INFO) << "update compaction threads started. number: " << update_compaction_num_threads;

    // tablet checkpoint thread
    for (auto data_dir : data_dirs) {
   
        _tablet_checkpoint_threads.emplace_back([this, data_dir] {
    _tablet_checkpoint_callback((void*)data_dir); });
        Thread::set_thread_name(_tablet_checkpoint_threads.back(), "tablet_check_pt");
    }
    LOG(INFO) << "tablet checkpoint thread started";

    // fd cache clean thread
    _fd_cache_clean_thread = std::thread([this] {
    _fd_cache_clean_callback(nullptr); });
    Thread::set_thread_name(_fd_cache_clean_thread, "fd_cache_clean");
    LOG(INFO) << "fd cache clean thread started";

    // path scan and gc thread
    if (config::path_gc_check) {
   
        for (auto data_dir : get_stores()) {
   
            _path_scan_threads.emplace_back([this, data_dir] {
    _path_scan_thread_callback((void*)data_dir); });
            _path_gc_threads.emplace_back([this, data_dir] {
    _path_gc_thread_callback((void*)data_dir); });
            Thread::set_thread_name(_path_scan_threads.back(), "path_scan");
            Thread::set_thread_name(_path_gc_threads.back(), "path_gc");
        }
        LOG(INFO) << "path scan/gc threads started. number:" << get_stores().size();
    }

    LOG(INFO) << "all storage engine's background threads are started.";
    return Status::OK();
}

首先查看BE启动时的线程创建函数。
从if (!config::enable_event_based_compaction_framework) {这行开始看
首先看到了一个enable_event_based_compaction_framework的判断，这个配置源码里面默认值为false
顾名思义，应该是未来StarRocks会有一套新的压缩框架，那咱们接下来分别阅读新旧压缩框架的源码
旧压缩框架

 // base and cumulative compaction threads
        int32_t base_compaction_num_threads_per_disk =
                std::max<int32_t>(1, config::base_compaction_num_threads_per_disk);
        int32_t cumulative_compaction_num_threads_per_disk =
                std::max<int32_t>(1, config::cumulative_compaction_num_threads_per_disk);
        int32_t base_compaction_num_threads = base_compaction_num_threads_per_disk * data_dir_num;
        int32_t cumulative_compaction_num_threads = cumulative_compaction_num_threads_per_disk * data_dir_num;

        // calc the max concurrency of compaction tasks
        int32_t max_compaction_concurrency = config::max_compaction_concurrency;
        if (max_compaction_concurrency < 0 ||
            max_compaction_concurrency > base_compaction_num_threads + cumulative_compaction_num_threads) {
   
            max_compaction_concurrency = base_compaction_num_threads + cumulative_compaction_num_threads;
        }

这几行用处不大，就是单纯的读取配置，获取base compaction和cumulative compaction的线程数
并且根据配置和上面线程数之和的到一个最大并发线程数max_compaction_concurrency vectorized::Compaction::init(max_compaction_concurrency);
这行就是把最大并发压缩线程数量放到一个变量中，未来每次有新的压缩任务运行时，都会判断一下当前并发线程数是否已经达到了这个值，如果达到了，就先不运行。

       for (uint32_t i = 0; i < base_compaction_num_threads; ++i) {
   
           _base_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
   
               _base_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
           });
           Thread::set_thread_name(_base_compaction_threads.back(), "base_compact");
       }
       LOG(INFO) << "base compaction threads started. number: " << base_compaction_num_threads;

       _cumulative_compaction_threads.reserve(cumulative_compaction_num_threads);
       for (uint32_t i = 0; i < cumulative_compaction_num_threads; ++i) {
   
           _cumulative_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
   
               _cumulative_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
           });
           Thread::set_thread_name(_cumulative_compaction_threads.back(), "cumulat_compact");
       }
       LOG(INFO) << "cumulative compaction threads started. number: " << cumulative_compaction_num_threads;

创建具体的压缩线程，并且指定了线程扫描目录。
新压缩框架

compaction_manager must init_max_task_num() before any comapction_scheduler starts
_compaction_manager->init_max_task_num();
_compaction_scheduler = std::thread([] {
   
    CompactionScheduler compaction_scheduler;
    compaction_scheduler.schedule();
});
Thread::set_thread_name(_compaction_scheduler, "compact_sched");
LOG(INFO) << "compaction scheduler started";

_compaction_checker_thread = std::thread([this] {
    compaction_check(); });
Thread::set_thread_name(_compaction_checker_thread, "compact_check");
LOG(INFO) << "compaction checker started";

新压缩框架首先调用init_max_task_num初始化最大的压缩任务数。
然后并不会初始化创建所有的压缩线程，而是创建一个调度线程和一个检查线程。
调度线程定时进行压缩条件判断，每次有新的压缩任务满足条件时，如果当前运行的压缩任务数量没有达到最大的压缩任务数，就启动一个临时线程去处理压缩。
好处就是在压缩频率不高的场景下，剔除了那些闲置的压缩线程。但是在压缩频率很高的场景下，这个工作方式可能会降低压缩性能，因为只有1个线程去调度压缩任务了。

void CompactionManager::init_max_task_num() {
   
    if (config::base_compaction_num_threads_per_disk >= 0 && config::cumulative_compaction_num_threads_per_disk >= 0) {
   
        _max_task_num = static_cast<int32_t>(
                StorageEngine::instance()->get_store_num() *
                (config::cumulative_compaction_num_threads_per_disk + config::base_compaction_num_threads_per_disk));
    } else {
   
        // When cumulative_compaction_num_threads_per_disk or config::base_compaction_num_threads_per_disk is less than 0,
        // there is no limit to _max_task_num if max_compaction_concurrency is also less than 0, and here we set maximum value to be 20.
        _max_task_num = std::