Compaction代码解读
对compacttion的解读主要是分为以下几部分:
1、 compaction线程的管理
2、 compaction候选sstable的选取
3、 compaction对sstable的合并。
其中1,3比较固定,应该是不用优化的,仅仅了解即可。
1 compaction线程的管理
1.1 compaction的启动
在cassandra启动后,会在open table 的最后创建cfs,cfs的构造函数会根据具体的compactionstrategy创建相应的对象,让我们看看两种具体的compactionstrategy的构造函数。
public SizeTieredCompactionStrategy(ColumnFamilyStore cfs, Map<String,String> options)
{
super(cfs, options);
this.estimatedRemainingTasks = 0;
String optionValue = options.get(MIN_SSTABLE_SIZE_KEY);
minSSTableSize = (null != optionValue) ? Long.parseLong(optionValue): DEFAULT_MIN_SSTABLE_SIZE;
cfs.setMaximumCompactionThreshold(cfs.metadata.getMaxCompactionThreshold());
cfs.setMinimumCompactionThreshold(cfs.metadata.getMinCompactionThreshold());
}
public LeveledCompactionStrategy(ColumnFamilyStorecfs, Map<String, String> options)
{
super(cfs, options);
int configuredMaxSSTableSize = 5;
if (options != null)
{
String value = options.containsKey(SSTABLE_SIZE_OPTION)? options.get(SSTABLE_SIZE_OPTION): null;
if (null != value)
{
try
{
configuredMaxSSTableSize =Integer.parseInt(value);
}
catch (NumberFormatException ex)
{
logger.warn(String.format("%s is not a parsable int (base10) for %s usingdefault value",
value, SSTABLE_SIZE_OPTION));
}
}
}
maxSSTableSizeInMB =configuredMaxSSTableSize;
cfs.getDataTracker().subscribe(this);
logger.debug("{} subscribed to the data tracker.",this);
manifest = LeveledManifest.create(cfs, this.maxSSTableSizeInMB);
logger.debug("Created {}", manifest);
// override min/max for this strategy
cfs.setMaximumCompactionThreshold(Integer.MAX_VALUE);
cfs.setMinimumCompactionThreshold(1);
}
关键是这个:反正很简单,就不说明了。
protectedAbstractCompactionStrategy(ColumnFamilyStore cfs, Map<String, String>options)
{
assertcfs != null;
this.cfs = cfs;
this.options = options;
// start compactions in fiveminutes (if no flushes have occurred by then to do so)
Runnable runnable = new Runnable()
{
public void run()
{
if (CompactionManager.instance.getActiveCompactions()== 0)
{
CompactionManager.instance.submitBackground(AbstractCompactionStrategy.this.cfs);
}
}
};
StorageService.optionalTasks.schedule(runnable, 5 * 60, TimeUnit.SECONDS);// optionalTasks是个线程库
}
从以上三段代码,可以看出,cassandra启动后就会创建compactionstrategy的线程,如果在5分钟内没有flush就会启动。
CompactionManager是对正在进行compaction线程的管理,compaction一般来说是单线程的(对一个cf来说)。牵涉到同步啊,锁啊,以及线程池。
下面看下主要的代码:CompactionManager.instance.submitBackground(AbstractCompactionStrategy.this.cfs);
/**
* Call this whenever a compaction might be needed on the given columnfamily.
* It's okay to over-call (within reason) since the compactions aresingle-threaded,
* and if a call is unnecessary, it will just be no-oped in the bucketing phase.
*/
public Future<Integer>submitBackground(finalColumnFamilyStore cfs)
{
logger.debug("Scheduling a background task check for {}.{} with{}",
new Object[] {cfs.table.name,
cfs.columnFamily,
cfs.getCompactionStrategy().getClass().getSimpleName()});
Callable<Integer> callable = new Callable<Integer>()
{
publicInteger call() throwsIOException
{
compactionLock.readLock().lock();
try
{
logger.debug("Checking {}.{}", cfs.table.name, cfs.columnFamily); // log after we get the lock so we can see delays fromthat if any
if (!