前言
本文是基于StarRocks 2.3.3版本源码阅读总结,不同版本源码可能有较大变化,仅供参考。
由于StarRocks的be是用c++语言写的,我的c++水平一般,所以自己对源码的解读可能有不正确的地方,欢迎大佬们指正。
Compaction机制
在开始阅读源码之前先简单介绍一下StarRocks的compaction
机制
StarRocks为了保证数据写入的高性能,每一次有新的数据写入的时候,并不会直接写到旧的数据文件中,而是将这些新数据单独写到一个新文件中,称为一个single file
。
如果数据目录中的single file
过多,那么在查询的时候肯定性能会大幅降低,为此StarRocks有两种压缩机制来处理这些新写入的文件。
cumulative compaction累进压缩
这是一种解决小文件过多的轻量级压缩机制,它不会将single file数据目录中的base file
压缩到一起,因为base file
太大了,可能会造成IO徒增影响集群性能,而是将多个single file
聚合到一起,聚合为一种cumulative file
。
默认情况下,每5个single file
生成都会触发一次cumulative compaction
。
它受到以下配置影响:
参数名 | 默认值 | 备注 |
---|---|---|
cumulative_compaction_check_interval_seconds | 1 | 线程检测周期,默认1s |
min_cumulative_compaction_num_singleton_deltas | 5 | 触发cumulative compaction的最小singleton file数量 |
max_cumulative_compaction_num_singleton_deltas | 1000 | 最大1次压缩多少个文件 |
cumulative_compaction_num_threads_per_disk | 1 | 每个磁盘用来处理cumulative 线程的数量 |
cumulative_compaction_skip_window_seconds | 30 | 跳过最新的single file时间,因为最新写入的数据可能会被马上查询,所以先不压缩 |
base compaction 基础压缩
在执行了若干次cumulative compaction
后,细粒度的小文件问题得到了缓解,但是引入了新的小文件问题:cumulative file
太多了,还是会影响查询。
因此在达到了某些条件后,系统开始执行base compaction
压缩,来将所有的cumulative
合并为1个文件。
因为base compaction
操作比较重,吃磁盘IO比较高,因此一般来说执行频率不是很高。
它受到以下配置影响:
参数名 | 默认值 | 备注 |
---|---|---|
base_compaction_check_interval_seconds | 60 | 线程的检测周期,60s |
min_base_compaction_num_singleton_deltas | 5 | 最小的single文件数量,指的是被cumulative后的sigle文件数量 |
max_base_compaction_num_singleton_deltas | 100 | 单次BaseCompaction合并的最大segment数 |
base_compaction_num_threads_per_disk | 1 | 每个磁盘 BaseCompaction 线程的数目 |
base_cumulative_delta_ratio | 0.3 | Cumulative文件大小达到Base文件大小的比例 |
base_compaction_interval_seconds_since_last_operation | 86400 | 上一轮 BaseCompaction 距今的间隔,是触发 BaseCompaction 条件之一。 |
Base Compaction源码阅读
Status StorageEngine::start_bg_threads() {
_update_cache_expire_thread = std::thread([this] {
_update_cache_expire_thread_callback(nullptr); });
Thread::set_thread_name(_update_cache_expire_thread, "cache_expire");
LOG(INFO) << "update cache expire thread started";
_unused_rowset_monitor_thread = std::thread([this] {
_unused_rowset_monitor_thread_callback(nullptr); });
Thread::set_thread_name(_unused_rowset_monitor_thread, "rowset_monitor");
LOG(INFO) << "unused rowset monitor thread started";
// start thread for monitoring the snapshot and trash folder
_garbage_sweeper_thread = std::thread([this] {
_garbage_sweeper_thread_callback(nullptr); });
Thread::set_thread_name(_garbage_sweeper_thread, "garbage_sweeper");
LOG(INFO) << "garbage sweeper thread started";
// start thread for monitoring the tablet with io error
_disk_stat_monitor_thread = std::thread([this] {
_disk_stat_monitor_thread_callback(nullptr); });
Thread::set_thread_name(_disk_stat_monitor_thread, "disk_monitor");
LOG(INFO) << "disk stat monitor thread started";
// convert store map to vector
std::vector<DataDir*> data_dirs;
for (auto& tmp_store : _store_map) {
data_dirs.push_back(tmp_store.second);
}
int32_t data_dir_num = data_dirs.size();
if (!config::enable_event_based_compaction_framework) {
// base and cumulative compaction threads
int32_t base_compaction_num_threads_per_disk =
std::max<int32_t>(1, config::base_compaction_num_threads_per_disk);
int32_t cumulative_compaction_num_threads_per_disk =
std::max<int32_t>(1, config::cumulative_compaction_num_threads_per_disk);
int32_t base_compaction_num_threads = base_compaction_num_threads_per_disk * data_dir_num;
int32_t cumulative_compaction_num_threads = cumulative_compaction_num_threads_per_disk * data_dir_num;
// calc the max concurrency of compaction tasks
int32_t max_compaction_concurrency = config::max_compaction_concurrency;
if (max_compaction_concurrency < 0 ||
max_compaction_concurrency > base_compaction_num_threads + cumulative_compaction_num_threads) {
max_compaction_concurrency = base_compaction_num_threads + cumulative_compaction_num_threads;
}
vectorized::Compaction::init(max_compaction_concurrency);
_base_compaction_threads.reserve(base_compaction_num_threads);
for (uint32_t i = 0; i < base_compaction_num_threads; ++i) {
_base_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
_base_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
});
Thread::set_thread_name(_base_compaction_threads.back(), "base_compact");
}
LOG(INFO) << "base compaction threads started. number: " << base_compaction_num_threads;
_cumulative_compaction_threads.reserve(cumulative_compaction_num_threads);
for (uint32_t i = 0; i < cumulative_compaction_num_threads; ++i) {
_cumulative_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
_cumulative_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
});
Thread::set_thread_name(_cumulative_compaction_threads.back(), "cumulat_compact");
}
LOG(INFO) << "cumulative compaction threads started. number: " << cumulative_compaction_num_threads;
} else {
// new compaction framework
// compaction_manager must init_max_task_num() before any comapction_scheduler starts
_compaction_manager->init_max_task_num();
_compaction_scheduler = std::thread([] {
CompactionScheduler compaction_scheduler;
compaction_scheduler.schedule();
});
Thread::set_thread_name(_compaction_scheduler, "compact_sched");
LOG(INFO) << "compaction scheduler started";
_compaction_checker_thread = std::thread([this] {
compaction_check(); });
Thread::set_thread_name(_compaction_checker_thread, "compact_check");
LOG(INFO) << "compaction checker started";
}
int32_t update_compaction_num_threads_per_disk =
std::max<int32_t>(1, config::update_compaction_num_threads_per_disk);
int32_t update_compaction_num_threads = update_compaction_num_threads_per_disk * data_dir_num;
_update_compaction_threads.reserve(update_compaction_num_threads);
for (uint32_t i = 0; i < update_compaction_num_threads; ++i) {
_update_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
_update_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
});
Thread::set_thread_name(_update_compaction_threads.back(), "update_compact");
}
LOG(INFO) << "update compaction threads started. number: " << update_compaction_num_threads;
// tablet checkpoint thread
for (auto data_dir : data_dirs) {
_tablet_checkpoint_threads.emplace_back([this, data_dir] {
_tablet_checkpoint_callback((void*)data_dir); });
Thread::set_thread_name(_tablet_checkpoint_threads.back(), "tablet_check_pt");
}
LOG(INFO) << "tablet checkpoint thread started";
// fd cache clean thread
_fd_cache_clean_thread = std::thread([this] {
_fd_cache_clean_callback(nullptr); });
Thread::set_thread_name(_fd_cache_clean_thread, "fd_cache_clean");
LOG(INFO) << "fd cache clean thread started";
// path scan and gc thread
if (config::path_gc_check) {
for (auto data_dir : get_stores()) {
_path_scan_threads.emplace_back([this, data_dir] {
_path_scan_thread_callback((void*)data_dir); });
_path_gc_threads.emplace_back([this, data_dir] {
_path_gc_thread_callback((void*)data_dir); });
Thread::set_thread_name(_path_scan_threads.back(), "path_scan");
Thread::set_thread_name(_path_gc_threads.back(), "path_gc");
}
LOG(INFO) << "path scan/gc threads started. number:" << get_stores().size();
}
LOG(INFO) << "all storage engine's background threads are started.";
return Status::OK();
}
首先查看BE启动时的线程创建函数。
从if (!config::enable_event_based_compaction_framework) {
这行开始看
首先看到了一个enable_event_based_compaction_framework
的判断,这个配置源码里面默认值为false
顾名思义,应该是未来StarRocks会有一套新的压缩框架,那咱们接下来分别阅读新旧压缩框架的源码
旧压缩框架
// base and cumulative compaction threads
int32_t base_compaction_num_threads_per_disk =
std::max<int32_t>(1, config::base_compaction_num_threads_per_disk);
int32_t cumulative_compaction_num_threads_per_disk =
std::max<int32_t>(1, config::cumulative_compaction_num_threads_per_disk);
int32_t base_compaction_num_threads = base_compaction_num_threads_per_disk * data_dir_num;
int32_t cumulative_compaction_num_threads = cumulative_compaction_num_threads_per_disk * data_dir_num;
// calc the max concurrency of compaction tasks
int32_t max_compaction_concurrency = config::max_compaction_concurrency;
if (max_compaction_concurrency < 0 ||
max_compaction_concurrency > base_compaction_num_threads + cumulative_compaction_num_threads) {
max_compaction_concurrency = base_compaction_num_threads + cumulative_compaction_num_threads;
}
这几行用处不大,就是单纯的读取配置,获取base compaction和cumulative compaction的线程数
并且根据配置和上面线程数之和的到一个最大并发线程数max_compaction_concurrency vectorized::Compaction::init(max_compaction_concurrency);
这行就是把最大并发压缩线程数量放到一个变量中,未来每次有新的压缩任务运行时,都会判断一下当前并发线程数是否已经达到了这个值,如果达到了,就先不运行。
for (uint32_t i = 0; i < base_compaction_num_threads; ++i) {
_base_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
_base_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
});
Thread::set_thread_name(_base_compaction_threads.back(), "base_compact");
}
LOG(INFO) << "base compaction threads started. number: " << base_compaction_num_threads;
_cumulative_compaction_threads.reserve(cumulative_compaction_num_threads);
for (uint32_t i = 0; i < cumulative_compaction_num_threads; ++i) {
_cumulative_compaction_threads.emplace_back([this, data_dir_num, data_dirs, i] {
_cumulative_compaction_thread_callback(nullptr, data_dirs[i % data_dir_num]);
});
Thread::set_thread_name(_cumulative_compaction_threads.back(), "cumulat_compact");
}
LOG(INFO) << "cumulative compaction threads started. number: " << cumulative_compaction_num_threads;
创建具体的压缩线程,并且指定了线程扫描目录。
新压缩框架
compaction_manager must init_max_task_num() before any comapction_scheduler starts
_compaction_manager->init_max_task_num();
_compaction_scheduler = std::thread([] {
CompactionScheduler compaction_scheduler;
compaction_scheduler.schedule();
});
Thread::set_thread_name(_compaction_scheduler, "compact_sched");
LOG(INFO) << "compaction scheduler started";
_compaction_checker_thread = std::thread([this] {
compaction_check(); });
Thread::set_thread_name(_compaction_checker_thread, "compact_check");
LOG(INFO) << "compaction checker started";
新压缩框架首先调用init_max_task_num初始化最大的压缩任务数。
然后并不会初始化创建所有的压缩线程,而是创建一个调度线程和一个检查线程。
调度线程定时进行压缩条件判断,每次有新的压缩任务满足条件时,如果当前运行的压缩任务数量没有达到最大的压缩任务数,就启动一个临时线程去处理压缩。
好处就是在压缩频率不高的场景下,剔除了那些闲置的压缩线程。但是在压缩频率很高的场景下,这个工作方式可能会降低压缩性能,因为只有1个线程去调度压缩任务了。
void CompactionManager::init_max_task_num() {
if (config::base_compaction_num_threads_per_disk >= 0 && config::cumulative_compaction_num_threads_per_disk >= 0) {
_max_task_num = static_cast<int32_t>(
StorageEngine::instance()->get_store_num() *
(config::cumulative_compaction_num_threads_per_disk + config::base_compaction_num_threads_per_disk));
} else {
// When cumulative_compaction_num_threads_per_disk or config::base_compaction_num_threads_per_disk is less than 0,
// there is no limit to _max_task_num if max_compaction_concurrency is also less than 0, and here we set maximum value to be 20.
_max_task_num = std::