Cassandra Leveled Compaction源码阅读

最新推荐文章于 2020-01-11 11:51:09 发布

weixin_33720078

最新推荐文章于 2020-01-11 11:51:09 发布

阅读量93

点赞数

原文链接：http://www.cnblogs.com/sing1ee/archive/2012/02/20/2765018.html

版权

笔记都是写给未来的自己看的。

Cassandra为什么要有Compaction的机制Cassandra是BigTable的列族（Column Family）存储方式，这是一种非常灵活的存储模型，即使在同一个表中，不同的key也可以有不同的列，而且列不是对齐存储的，节省了空间。在Cassandra中，insert和update在底层都是追加的方式实现的，例如，数据key1对应column1，column2两列，要更新key1数据，更新内容为column2，c0lumn3。column1不变，column2更新，column是新加的。并且由于追加的方式，原数据和更新数据分别在两个sstable文件中，cassandra的追加的写方式，保证了写性能，但是因为同一个key的数据，会出现在多个sstable中，会严重影响读的性能，就必须进行compaction操作。进行compaction操作的另一个原因是，从memtable flush到磁盘的sstable比较小，个数很多，要均衡文件的个数，也要进行compaction。这样，compaction主要有两个目的：

合并更新的数据，尽量保证数据只在一个sstable中出现，减少read时候seek磁盘的次数，提升读性能
合并小文件

Cassandra的Compaction机制

Cassandra1.0以后，主要有两种compaction机制:1）SizeTieredCompactionStrategy，2）LeveledCompactionStrategy。第一个是Cassandra开始自带的compaction机制，唯一的优点，就是适合insert操作比较多的场景，或者说绝大多数是insert操作，极少量更新操作的场景。主要的缺点有：

同一个key的数据，会出现在多个sstable中，情况比LeveledCompactionStrategy严重，不仅层和层之间有，层之内也会重复出现
compaction操作比较浪费空间，需要预留出数据一倍的空间，用来做compaction操作。

LeveledCompactionStrategy是1.0之后引入的，LevelDB的compaciton机制。主要的特点是：

占用空间比较少，只需要全部数据10%的额外空间；
每一层内，是没有重复key的，能够保证90%的读只需要读一个sstable文件
10T数据，只需要7层，也就是单机存储10T数据，最多只需要读7个sstable文件

LeveledCompactionStrategy机制主要应对的是update操作相对较多的场景，对于，insert较多操作场景，并不适合，主要原因如下是，当写操作很多的时候，会生成太多的L0的文件（L0层是会有重复key数据的）来不及compaction，导致读性能下降。解决这个问题一些常用的方法：

加大memtable的大小；
采用多线程的compaction机制

这些方案，都有自己的缺点，在实际引用中并不理想。

Leveled Compaction源码阅读

Leveled Compaction机制最核心的代码在LeveledManifest中，如下：

public synchronized Collection getCompactionCandidates()
    {
        // LevelDB gives each level a score of how much data it contains vs its ideal amount, and
        // compacts the level with the highest score. But this falls apart spectacularly once you
        // get behind.  Consider this set of levels:
        // L0: 988 [ideal: 4]
        // L1: 117 [ideal: 10]
        // L2: 12  [ideal: 100]
        //
        // The problem is that L0 has a much higher score (almost 250) than L1 (11), so what we'll
        // do is compact a batch of MAX_COMPACTING_L0 sstables with all 117 L1 sstables, and put the
        // result (say, 120 sstables) in L1. Then we'll compact the next batch of MAX_COMPACTING_L0,
        // and so forth.  So we spend most of our i/o rewriting the L1 data with each batch.
        //
        // If we could just do *all* L0 a single time with L1, that would be ideal.  But we can't
        // -- see the javadoc for MAX_COMPACTING_L0.
        //
        // LevelDB's way around this is to simply block writes if L0 compaction falls behind.
        // We don't have that luxury.
        //
        // So instead, we force compacting higher levels first.  This may not minimize the number
        // of reads done as quickly in the short term, but it minimizes the i/o needed to compact
        // optimially which gives us a long term win.
        for (int i = generations.length - 1; i >= 0; i--)
        {
            List sstables = generations[i];
            if (sstables.isEmpty())
                continue; // mostly this just avoids polluting the debug log with zero scores
            double score = SSTableReader.getTotalBytes(sstables) / maxBytesForLevel(i);
            logger.debug("Compaction score for level {} is {}", i, score);

            // L0 gets a special case that if we don't have anything more important to do,
            // we'll go ahead and compact even just one sstable
            if (score > 1.001 || i == 0)
            {
                Collection candidates = getCandidatesFor(i);
                if (logger.isDebugEnabled())
                    logger.debug("Compaction candidates for L{} are {}", i, toString(candidates));
                return candidates;
            }
        }

        return Collections.emptyList();
    }

代码比较好懂，只说几点：