LSM-tree最透彻的解释

LevelDB is an open source key-value store that originated
from Google’s BigTable [18]. It is an implementation of
LSM-tree, and it has received increased attention in both
industry and academia [6][34][2]. Figure 1 illustrates the
architecture of LevelDB, which consists of two MemTables
in main memory and a set of SSTables [18] in the disk and
other auxiliary files, such as the Manifest file which stores

the metadata of SSTables.

When the user inserts a key-value pair into LevelDB, it
will be first saved in a log file. Then it is inserted into a sorted
structure in memory, called MemTable, which holds the
most recent updates. When the size of incoming data items
reaches its full capacity, the MemTable will be transformed
into a read-only Immutable MemTable. A new MemTable
will be created to accumulate fresh updates. At the same
time, a background thread begins to dump the Immutable
MemTable into the disk and generate a new Sorted String
Table file (SSTable). Deletes are a special case of update
wherein a deletion marker is stored.
An SSTable stores a sequence of data items sorted by
their keys. The set of SSTables are organized into a series
of levels, as shown in Figure 1. The youngest level, Level 0,
is produced by writing the Immutable MemTable from main
memory to the disk. Thus SSTables in Level 0 could contain
overlapping keys. However, in other levels the key range of
SSTables are non-overlapping. Each level has a limit on the
maximum number of SSTables, or equivalently, on the total
amount of data because each SSTable has a fixed size in a
level. The limit grows at an exponential rate with the level
number. For example, the maximum amount of data in Level
1 will not exceed 10 MB, and it will not exceed 100 MB for
Level 2.
In order to keep the stored data in an optimized layout,
a compaction process will be conducted. The background
compaction thread will monitor the SSTable files. When the

total size of Level L exceeds its limit, the compaction thread

will pick one SSTable from Level L and all overlapping
ones from the next Level L+1. These files are used as inputs
to the compaction and are merged together to produce a
series of new Level L+1 files. When the output file has
reached the predefined size (2 MB by default), another new
SSTable is created. All inputs will be discarded after the
compaction. Note that the compaction from Level 0 to Level
1 is treated differently than those between other levels. When
the number of SSTables in Level 0 exceeds an upper limit
(4 by default), the compaction is triggered. The compaction
may involve more than one Level 0 file in case some of them
overlap with each other.
By conducting compaction, LevelDB eliminates overwritten
values and drops deleted markers. The compaction
operation also ensures that the freshest data reside in the
lowest level. The stale data will gradually move to the higher
levels.
The data retrieving, or read operation, is more complicated
than the insertion. When LevelDB receives a Get(Key,
Value) request, it will first do a look up in the MemTable,
then in Immutable MemTable, and finally search the SSTables
from Level 0 to higher levels in the order until a matched
KV data item is found. Once LevelDB finds the key in a certain
level, it will stop its search. As we mentioned before,
lower levels contain fresher data items. The new data will be
searched earlier than old data. Similar to compaction, more
than one Level 0 file could be searched because of their data
overlapping. A Bloom filter [14] is usually adopted to reduce
the I/O cost for reading data blocks that do not contain
requested KV items.

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值