LSM-tree最透彻的解释

最新推荐文章于 2024-06-19 14:37:03 发布

sdulibh

最新推荐文章于 2024-06-19 14:37:03 发布

阅读量5.7k

点赞数 1

分类专栏：计算机算法

计算机算法专栏收录该内容

49 篇文章 2 订阅

订阅专栏

LevelDB is an open source key-value store that originated
from Google’s BigTable [18]. It is an implementation of
LSM-tree, and it has received increased attention in both
industry and academia [6][34][2]. Figure 1 illustrates the
architecture of LevelDB, which consists of two MemTables
in main memory and a set of SSTables [18] in the disk and
other auxiliary files, such as the Manifest file which stores

the metadata of SSTables.

When the user inserts a key-value pair into LevelDB, it
will be first saved in a log file. Then it is inserted into a sorted
structure in memory, called MemTable, which holds the
most recent updates. When the size of incoming data items
reaches its full capacity, the MemTable will be transformed
into a read-only Immutable MemTable. A new MemTable
will be created to accumulate fresh updates. At the same
time, a background thread begins to dump the Immutable
MemTable into the disk and generate a new Sorted String
Table file (SSTable). Deletes are a special case of update
wherein a deletion marker is stored.
An SSTable stores a sequence of data items sorted by
their keys. The set of SSTables are organized into a series
of levels, as shown in Figure 1. The youngest level, Level 0,
is produced by writing the Immutable MemTable from main
memory to the disk. Thus SSTables in Level 0 could contain
overlapping keys. However, in other levels the key range of
SSTables are non-overlapping. Each level has a limit on the
maximum number of SSTables, or equivalently, on the total
amount of data because each SSTable has a fixed size in a
level. The limit grows at an exponential rate with the level
number. For example, the maximum amount of data in Level
1 will not exceed 10 MB, and it will not exceed 100 MB for
Level 2.
In order to keep the stored data in an optimized layout,
a compaction process will be conducted. The background
compaction thread will monitor the SSTable files. When the

total size of Level L exceeds its limit, the compaction thread

will pick one SSTable from Level L and all overlapping
ones from the next Level L+1. These files are used as inputs
to the compaction and are merged together to produce a
series of new Level L+1 files. When the output file has
reached the predefined size (2 MB by default), another new
SSTable is created. All inputs will be discarded after the
compaction. Note that the compaction from Level 0 to Level
1 is treated differently than those between other levels. When
the number of SSTables in Level 0 exceeds an upper limit
(4 by default), the compaction is triggered. The compaction
may involve more than one Level 0 file in case some of them
overlap with each other.
By conducting compaction, LevelDB eliminates overwritten
values and drops deleted markers. The compaction
operation also ensures that the freshest data reside in the
lowest level. The stale data will gradually move to the higher
levels.
The data retrieving, or read operation, is more complicated
than the insertion. When LevelDB receives a Get(Key,
Value) request, it will first do a look up in the MemTable,
then in Immutable MemTable, and finally search the SSTables
from Level 0 to higher levels in the order until a matched
KV data item is found. Once LevelDB finds the key in a certain
level, it will stop its search. As we mentioned before,
lower levels contain fresher data items. The new data will be
searched earlier than old data. Similar to compaction, more
than one Level 0 file could be searched because of their data
overlapping. A Bloom filter [14] is usually adopted to reduce
the I/O cost for reading data blocks that do not contain
requested KV items.