我们知道rockdb作为kv存储,采用的WAL方式来写日志,即预写日志,每次要对数据操作之前,先写日志保存起来,然后在进行相应操作。这样当发生某些意外而导致还未写到磁盘中的数据丢失时,我们可以采用log文件来进行恢复。通过读取磁盘中的内容和已知的WAL日志,就可以恢复到最新的状态。而memtable和未写入磁盘的immemtable则从log文件中读出来,重做memtable和immemtable到sstable中即可。
整个恢复流程粗略如下流程图:
在DBImpl::Recover()函数中,NEWDB()是创建新的生成全新的manifest和current文件,然后获取wal文件目录,如果目录为空,则返回,否则开始recoverlogfile()操作。恢复的是memtable及immtable中还未持久化到SSTable中的数据。
接下来的日志操作就是恢复内存中的数据了,重做日志操作即是将日志中记录的操作读取出来,然后再将读取到的操作重新写入到rocksdb中,如果缓存大小大于写缓存尺寸,就写入到保存在sstable中,RecoverFogFiles()函数中实现,具体流程如下:
logfilename()得到log文件名,然后env_->NewSequentialFile打开日志文件,如果打开状态成功,得到info_log状态,在sstble0中插入该记录。
writeleve0tableforrecovery()函数比较简单,具体过程如下:
开始先给文件解锁,Buildtable()建立sstable,如果创建成功,则调用LogAndNotifyTableFileCreation()写入到logger类,最后如果当前memtable文件size大于0,则调用edit->AddFile写入到manifast文件。
Status DBImpl::Recover()部分代码如下
s = env_->LockFile(LockFileName(dbname_), &db_lock_); //文件加锁
if (!s.ok()) { return s; }
s = env_->FileExists(CurrentFileName(dbname_));
if (s.IsNotFound()) {
if (db_options_.create_if_missing) {
s = NewDB(); //生成全新的manifest和current文件
is_new_db = true;
} else {
return Status::InvalidArgument(
dbname_, "does not exist (create_if_missing is false)");
}
} else if (s.ok()) {
if (db_options_.error_if_exists) {
return Status::InvalidArgument(
dbname_, "exists (error_if_exists is true)");
}
} else {
// Unexpected error reading file
assert(s.IsIOError());
return s;
}
// Check for the IDENTITY file and create it if not there
s = env_->FileExists(IdentityFileName(dbname_));
if (s.IsNotFound()) {
s = SetIdentityFile(env_, dbname_);
if (!s.ok()) {
return s;
}
} else if (!s.ok()) {
assert(s.IsIOError());
return s;
}
}
Status s = versions_->Recover(column_families, read_only); // 恢复当前version信息
if (db_options_.paranoid_checks && s.ok()) {
s = CheckConsistency(); //检查文件一致性
}
if (s.ok()) {
SequenceNumber max_sequence(kMaxSequenceNumber);
default_cf_handle_ = new ColumnFamilyHandleImpl(
versions_->GetColumnFamilySet()->GetDefault(), this, &mutex_);
default_cf_internal_stats_ = default_cf_handle_->cfd()->internal_stats();
single_column_family_mode_ =
versions_->GetColumnFamilySet()->NumberOfColumnFamilies() == 1;
// Recover from all newer log files than the ones named in the
// descriptor (new log files may have been added by the previous
// incarnation without registering them in the descriptor).
//
// Note that prev_log_number() is no longer used, but we pay
// attention to it in case we are recovering a database
// produced by an older version of rocksdb.
const uint64_t min_log = versions_->MinLogNumber();
const uint64_t prev_log = versions_->prev_log_number();
std::vector<std::string> filenames;
s = env_->GetChildren(db_options_.wal_dir, &filenames); // 获取wal文件目录
std::vector<uint64_t> logs;
for (size_t i = 0; i < filenames.size(); i++) {
uint64_t number;
FileType type;
if (ParseFileName(filenames[i], &number, &type) && type