上篇我们介绍了Druid的Column相关实现原理,本次介绍Segment的实现原理。Column在Druid中用于管理单列,Segment则用于管理一组列。这组列包括了Dimension和Metric。我们首先看下Segment的定义接口:
public interface Segment extends Closeable {
public String getIdentifier();
public Interval getDataInterval();
public QueryableIndex asQueryableIndex();
public StorageAdapter asStorageAdapter();
/**
* Request an implementation of a particular interface.
*
* If the passed-in interface is {@link QueryableIndex} or {@link StorageAdapter}, then this method behaves
* identically to {@link #asQueryableIndex()} or {@link #asStorageAdapter()}. Other interfaces are only
* expected to be requested by callers that have specific knowledge of extra features provided by specific
* segment types. For example, an extension might provide a custom Segment type that can offer both
* StorageAdapter and some new interface. That extension can also offer a Query that uses that new interface.
*
* Implementations which accept classes other than {@link QueryableIndex} or {@link StorageAdapter} are limited
* to using those classes within the extension. This means that one extension cannot rely on the `Segment.as`
* behavior of another extension.
*
* @param clazz desired interface
* @param <T> desired interface
* @return instance of clazz, or null if the interface is not supported by this segment
*/
public <T> T as(Class<T> clazz);
}
从下往上看:
- as方法如上面注释所述,给自定义segment提供了一个转换用的接口。
- asStorageAdaptor:这个方法提供了一个StorageAdaptor对象。StorageAdaptor提供了游标(cursor)的功能,它提供了查询每行数据的能力。
- asQueryableIndex:这个方法返回了一个QueryableIndex对象。QueryableIndex是面向查询的数据接口。它提供了访问每一列的能力。
- getDataInterval:返回了一个Interval对象。表明该segment数据所在的起止时间。
- getIdentifier:返回该segment的唯一标识。其格式为:<datasource>_<start>_<end>_<version>_<partitionNum>
上面的QueryableIndex在一定方式下可以转换成StorageAdaptor接口。下面分别看一下这两个接口:
1. QueryableIndex
QueryableIndex提供了访问每一列的能力,支持对某些列的查询。其接口实现如下:
public interface ColumnSelector
{
public Column getColumn(String columnName);
}
public interface QueryableIndex extends ColumnSelector, Closeable
{
public Interval getDataInterval();
public int getNumRows();
public Indexed<String> getColumnNames();
public Indexed<String> getAvailableDimensions();
public BitmapFactory getBitmapFactoryForDimensions();
public Metadata getMetadata();
/**
* The close method shouldn't actually be here as this is nasty. We will adjust it in the future.
* @throws java.io.IOException if an exception was thrown closing the index
*/
//@Deprecated // This is still required for SimpleQueryableIndex. It should not go away unitl SimpleQueryableIndex is fixed
public void close() throws IOException;
}
其中Metadata提供了segment的元数据,如列名等。他实现了ColumnSelector,这个接口用于选择一个列。因此,QueryableIndex可以提供惊喜到列的查询。
2. StorageAdaptor
其接口代码如下:
public interface CursorFactory
{
public Sequence<Cursor> makeCursors(Filter filter, Interval interval, QueryGranularity gran, boolean descending);
}
public interface StorageAdapter extends CursorFactory
{
public String getSegmentIdentifier();
public Interval getInterval();
public Indexed<String> getAvailableDimensions();
public Iterable<String> getAvailableMetrics();
/**
* Returns the number of distinct values for the given dimension column
* For dimensions of unknown cardinality, e.g. __time this currently returns
* Integer.MAX_VALUE
*
* @param column
* @return
*/
public int getDimensionCardinality(String column);
public DateTime getMinTime();
public DateTime getMaxTime();
public Comparable getMinValue(String column);
public Comparable getMaxValue(String column);
public Capabilities getCapabilities();
public ColumnCapabilities getColumnCapabilities(String column);
/**
* Like {@link ColumnCapabilities#getType()}, but may return a more descriptive string for complex columns.
* @param column column name
* @return type name
*/
public String getColumnTypeName(String column);
public int getNumRows();
public DateTime getMaxIngestedEventTime();
public Metadata getMetadata();
}
从以上代码可以看出,StorageAdaptor实现了CursorFactory,可以通过游标访问每一行数据,包括对数据进行过滤等。示例代码如下所示:
return Sequences.filter(
Sequences.map(
adapter.makeCursors(filter, queryIntervals.get(0), granularity, descending),
new Function<Cursor, Result<T>>()
{
@Override
public Result<T> apply(Cursor input)
{
log.debug("Running over cursor[%s]", adapter.getInterval(), input.getTime());
return mapFn.apply(input);
}
}
),
Predicates.<Result<T>>notNull()
);
3. IncrementalIndex
IncrementalIndex是增量索引的核心结构,他实现了Iterable<Row>接口,并且支持通过add(InputRow row)方法来插入新的数据,新数据的metric通过aggregator进行聚合。其逻辑为:如果新加入的一行在segment中已经存在了,它会增加metric的值,而不是新增一行。其代码如下:
public int add(InputRow row) throws IndexSizeExceededException {
TimeAndDims key = toTimeAndDims(row);
final int rv = addToFacts(
metrics,
deserializeComplexMetrics,
reportParseExceptions,
row,
numEntries,
key,
in,
rowSupplier
);
updateMaxIngestedTime(row.getTimestamp());
return rv;
}
其聚合的方法即为addToFacts方法。以某个实现方法为例,如下所示:
protected Integer addToFacts(
AggregatorFactory[] metrics,
boolean deserializeComplexMetrics,
boolean reportParseExceptions,
InputRow row,
AtomicInteger numEntries,
TimeAndDims key,
ThreadLocal<InputRow> rowContainer,
Supplier<InputRow> rowSupplier
) throws IndexSizeExceededException
{
final Integer priorIdex = getFacts().get(key);
Aggregator[] aggs;
if (null != priorIdex) {
aggs = indexedMap.get(priorIdex);
} else {
aggs = new Aggregator[metrics.length];
for (int i = 0; i < metrics.length; i++) {
final AggregatorFactory agg = metrics[i];
aggs[i] = agg.factorize(
makeColumnSelectorFactory(agg, rowSupplier, deserializeComplexMetrics)
);
}
Integer rowIndex;
do {
rowIndex = indexIncrement.incrementAndGet();
} while (null != indexedMap.putIfAbsent(rowIndex, aggs));
// Last ditch sanity checks
if (numEntries.get() >= maxRowCount && !getFacts().containsKey(key)) {
throw new IndexSizeExceededException("Maximum number of rows reached");
}
final Integer prev = getFacts().putIfAbsent(key, rowIndex);
if (null == prev) {
numEntries.incrementAndGet();
} else {
// We lost a race
aggs = indexedMap.get(prev);
// Free up the misfire
indexedMap.remove(rowIndex);
// This is expected to occur ~80% of the time in the worst scenarios
}
}
rowContainer.set(row);
for (Aggregator agg : aggs) {
synchronized (agg) {
try {
agg.aggregate();
}
catch (ParseException e) {
// "aggregate" can throw ParseExceptions if a selector expects something but gets something else.
if (reportParseExceptions) {
throw e;
}
}
}
}
rowContainer.set(null);
return numEntries.get();
}
}
如以上代码实现,每来一行数据,都会调用segment中aggregator的aggregate()方法来进行聚合。aggregator由segment的定义来决定。
QueryableInxexStorageAdaptor提供了从QueryableIndex适配成StorageAdaptor的实现。IncrementalIndexStorageAdaptor提供了从IncrementalIndex适配成StorageAdaptor的实现,在转化过程中,构建一个游标,并将列中的每一个值都加入到row中。
4. 装载索引文件:IndexIO
IndexIO提供了装载文件的功能:使用loadIndex(File inDir)方法将segment从文件中load起来。它返回一个QueryableIndex对象。其实现如下:
public QueryableIndex loadIndex(File inDir) throws IOException
{
final int version = SegmentUtils.getVersionFromDir(inDir);
final IndexLoader loader = indexLoaders.get(version);
if (loader != null) {
return loader.load(inDir, mapper);
} else {
throw new ISE("Unknown index version[%s]", version);
}
}
其中,IndexLoader是真正的干活的对象。我们看下这个对象的实现,以v9格式为例:
static class V9IndexLoader implements IndexLoader
{
private final ColumnConfig columnConfig;
V9IndexLoader(ColumnConfig columnConfig)
{
this.columnConfig = columnConfig;
}
@Override
public QueryableIndex load(File inDir, ObjectMapper mapper) throws IOException
{
log.debug("Mapping v9 index[%s]", inDir);
long startTime = System.currentTimeMillis();
final int theVersion = Ints.fromByteArray(Files.toByteArray(new File(inDir, "version.bin")));
if (theVersion != V9_VERSION) {
throw new IllegalArgumentException(String.format("Expected version[9], got[%s]", theVersion));
}
SmooshedFileMapper smooshedFiles = Smoosh.map(inDir);
ByteBuffer indexBuffer = smooshedFiles.mapFile("index.drd");
/**
* Index.drd should consist of the segment version, the columns and dimensions of the segment as generic
* indexes, the interval start and end millis as longs (in 16 bytes), and a bitmap index type.
*/
final GenericIndexed<String> cols = GenericIndexed.read(indexBuffer, GenericIndexed.STRING_STRATEGY);
final GenericIndexed<String> dims = GenericIndexed.read(indexBuffer, GenericIndexed.STRING_STRATEGY);
final Interval dataInterval = new Interval(indexBuffer.getLong(), indexBuffer.getLong());
final BitmapSerdeFactory segmentBitmapSerdeFactory;
/**
* This is a workaround for the fact that in v8 segments, we have no information about the type of bitmap
* index to use. Since we cannot very cleanly build v9 segments directly, we are using a workaround where
* this information is appended to the end of index.drd.
*/
if (indexBuffer.hasRemaining()) {
segmentBitmapSerdeFactory = mapper.readValue(serializerUtils.readString(indexBuffer), BitmapSerdeFactory.class);
} else {
segmentBitmapSerdeFactory = new BitmapSerde.LegacyBitmapSerdeFactory();
}
Metadata metadata = null;
ByteBuffer metadataBB = smooshedFiles.mapFile("metadata.drd");
if (metadataBB != null) {
try {
metadata = mapper.readValue(
serializerUtils.readBytes(metadataBB, metadataBB.remaining()),
Metadata.class
);
}
catch (JsonParseException | JsonMappingException ex) {
// Any jackson deserialization errors are ignored e.g. if metadata contains some aggregator which
// is no longer supported then it is OK to not use the metadata instead of failing segment loading
log.warn(ex, "Failed to load metadata for segment [%s]", inDir);
}
catch (IOException ex) {
throw new IOException("Failed to read metadata", ex);
}
}
Map<String, Column> columns = Maps.newHashMap();
for (String columnName : cols) {
columns.put(columnName, deserializeColumn(mapper, smooshedFiles.mapFile(columnName)));
}
columns.put(Column.TIME_COLUMN_NAME, deserializeColumn(mapper, smooshedFiles.mapFile("__time")));
final QueryableIndex index = new SimpleQueryableIndex(
dataInterval, cols, dims, segmentBitmapSerdeFactory.getBitmapFactory(), columns, smooshedFiles, metadata
);
log.debug("Mapped v9 index[%s] in %,d millis", inDir, System.currentTimeMillis() - startTime);
return index;
}
private Column deserializeColumn(ObjectMapper mapper, ByteBuffer byteBuffer) throws IOException
{
ColumnDescriptor serde = mapper.readValue(
serializerUtils.readString(byteBuffer), ColumnDescriptor.class
);
return serde.read(byteBuffer, columnConfig);
}
}
这个类会将存储segment的index.zip文件中的所有的drd文件加载到内存中,生成一个QueryableIndex对象返回。
5. 索引持久化
在segment的生成过程中,需要将segment进行持久化,保存到deep storage中。IndexMerger负责索引的持久化。不多说,其逻辑引用一张图:
6. Segment的存储结构是什么样的?