了解leveldb 的snapshot首先得了解SequenceNumber。当插入数据时,SequenceNumber会依次增长,例如插入key1, key2, key3, key4等数据时,依次对应的SequenceNumber为1, 2, 3, 4。当然,并不是每次都会如此简单,当存在合并写时,例如key1, key2, key3, key4,key5. key1对应的SequenceNumber为1, key2, key3, key4对应的SequenceNumber为2, key5对应的SequenceNumber为5.
一条kv键对会安如下格式插入到memtable里去:
internal_key_size internal_key value_size value
----------------------------|-----------------------|-----------------------|---------------
其中,internal_key 里就带了SequenceNumber, internal_key格式如下:
key SequenceNumber type(value类型)
---------------------|--------------------------------------|--------------------------
也就是说SequenceNumber会跟随着kv键对存储的。
接下来,我们看看snapshot的api, 接口和实现如下:
1 const Snapshot* DBImpl::GetSnapshot() { 2 MutexLock l(&mutex_); 3 return snapshots_.New(versions_->LastSequence()); 4 } 5 6 void DBImpl::ReleaseSnapshot(const Snapshot* s) { 7 MutexLock l(&mutex_); 8 snapshots_.Delete(reinterpret_cast<const SnapshotImpl*>(s)); 9 }
snapshots_为一个维护snapshot的双向链表。每次获取一个snapshot,就以当前的SequenceNumber new一个snapshot, 并插入到双向链表中。当释放一个snapshot时,就从双向链表中删除。
那么如何保持快照的数据不会被删除了?在leveldb中,唯一会删除数据的地方就是compaction了。so,我们看下DBImpl::DoCompactionWork的核心部分
1 Status DBImpl::DoCompactionWork(CompactionState* compact) { 2 //................... 3 if (snapshots_.empty()) { 4 compact->smallest_snapshot = versions_->LastSequence(); 5 } else { 6 compact->smallest_snapshot = snapshots_.oldest()->number_; 7 } 8 9 // Release mutex while we're actually doing the compaction work 10 mutex_.Unlock(); 11 12 Iterator* input = versions_->MakeInputIterator(compact->compaction); 13 input->SeekToFirst(); 14 Status status; 15 ParsedInternalKey ikey; 16 std::string current_user_key; 17 bool has_current_user_key = false; 18 SequenceNumber last_sequence_for_key = kMaxSequenceNumber; 19 for (; input->Valid() && !shutting_down_.Acquire_Load(); ) { 20 //.............................. 21 // Handle key/value, add to state, etc. 22 bool drop = false; 23 if (!ParseInternalKey(key, &ikey)) { 24 // Do not hide error keys 25 current_user_key.clear(); 26 has_current_user_key = false; 27 last_sequence_for_key = kMaxSequenceNumber; 28 } else { 29 if (!has_current_user_key || 30 user_comparator()->Compare(ikey.user_key, 31 Slice(current_user_key)) != 0) { 32 // First occurrence of this user key 33 current_user_key.assign(ikey.user_key.data(), ikey.user_key.size()); 34 has_current_user_key = true; 35 last_sequence_for_key = kMaxSequenceNumber; 36 } 37 38 if (last_sequence_for_key <= compact->smallest_snapshot) { 39 // Hidden by an newer entry for same user key 40 drop = true; // (A) 41 } else if (ikey.type == kTypeDeletion && 42 ikey.sequence <= compact->smallest_snapshot && 43 compact->compaction->IsBaseLevelForKey(ikey.user_key)) { 44 // For this user key: 45 // (1) there is no data in higher levels 46 // (2) data in lower levels will have larger sequence numbers 47 // (3) data in layers that are being compacted here and have 48 // smaller sequence numbers will be dropped in the next 49 // few iterations of this loop (by rule (A) above). 50 // Therefore this deletion marker is obsolete and can be dropped. 51 drop = true; 52 } 53 54 last_sequence_for_key = ikey.sequence; 55 } 56 57 if (!drop) { 58 //.............................. 59 } 60 61 input->Next(); 62 } 63 }
在第6行中,compact->smallest_snapshot 赋值为最旧的snapshot的SequenceNumber. 随后创建了compation目标的iterator, 对于同一个key_a, 遍历时可能会出现
(key_a, value5)--------(key_a, value4)--------(key_a, value3)--------(key_a, value2)--------(key_a, value1)的顺序。
当遍历至(key_a, value5)时, 会运行33-35行的代码。随后last_sequence_for_key赋值为(key_a, value5) , 下一次遍历至(key_a, value4)时,将last_sequence_for_key 和compact->smallest_snapshot做比较,如果last_sequence_for_key小于compact->smallest_snapshot时,表示last_sequence_for_key比最旧的snaphot的SequenceNumber还要小,因此(key_a, value4)可以在compact时drop掉。否则,如果(key_a, value4)是删除操作,并且其sequency小于最旧的snaphot的SequenceNumber, 并且比该kv所在level更高level上没有相同key时这三个条件都满足时,也可以在compact时drop掉。其它情况都不可以drop.
这样的compact逻辑就是为了旧snapshot可以读到旧的值,而不会因为后续的更新而变化。达到快照的目的。
Get时,可以通过option传入snapshot参数。在Get逻辑中,实际的seek时会跳过SequenceNumber比snapshot大的kv键对。从而保证读到的时snapshot时的值,而非后续的新值。