Column是Druid中Segment的基础列的基础接口。其结构图如下所示:
首先看下Column接口:
public interface Column
{
public static final String TIME_COLUMN_NAME = "__time";
public ColumnCapabilities getCapabilities();
public int getLength();
public DictionaryEncodedColumn getDictionaryEncoding();
public RunLengthColumn getRunLengthColumn();
public GenericColumn getGenericColumn();
public ComplexColumn getComplexColumn();
public BitmapIndex getBitmapIndex();
public SpatialIndex getSpatialIndex();
}
它是Druid的最基础的数据结构。它提供了不同类型的Column的获取方法,列上的索引获取等等。下面看下他的几个方法:
- GenericColumn:是一个范型的列的接口,用于取得某列里某行的值,它目前支持字符串,浮点数和整数的获取。
/**
*/
public interface GenericColumn extends Closeable
{
public int length();
public ValueType getType();
public boolean hasMultipleValues();
public String getStringSingleValueRow(int rowNum);
public Indexed<String> getStringMultiValueRow(int rowNum);
public float getFloatSingleValueRow(int rowNum);
public IndexedFloats getFloatMultiValueRow(int rowNum);
public long getLongSingleValueRow(int rowNum);
public IndexedLongs getLongMultiValueRow(int rowNum);
}
- DictionaryEncodedColumn:表示字典编码索引,Druid中字符串的列实际上用的都是这种数据结构,在基数不大的情况下可以使用这种模式(因为其内部用的是LRUMap)。
public interface DictionaryEncodedColumn extends Closeable
{
public int length();
public boolean hasMultipleValues();
public int getSingleValueRow(int rowNum);
public IndexedInts getMultiValueRow(int rowNum);
public String lookupName(int id);
public int lookupId(String name);
public int getCardinality();
}
- RunLengthColumn:这个接口在0.9里面没有实现。
public interface RunLengthColumn
{
public void thisIsAFictionalInterfaceThatWillHopefullyMeanSomethingSometime();
}
- ComplexColumn:是一种复杂对象列,常常用于一些扩展的复杂数据类型,比如HyperLogLog,Histogram等等。
public interface ComplexColumn extends Closeable
{
public Class<?> getClazz();
public String getTypeName();
public Object getRowValue(int rowNum);
}
- BitmapIndex:这时Druid最核心的数据结构之一,他为列中的每一个值都创建一个Bitmap,Bitmap在内存中也会压缩,因此查询扫描时也能够兼顾速度和内存大小。其and,or,not等操作也能很好的适应相关的查询条件。
public interface BitmapIndex
{
public int getCardinality();
public String getValue(int index);
public boolean hasNulls();
public BitmapFactory getBitmapFactory();
/**
* Returns the index of "value" in this BitmapIndex, or (-(insertion point) - 1) if the value is not
* present, in the manner of Arrays.binarySearch.
*
* @param value value to search for
* @return index of value, or negative number equal to (-(insertion point) - 1).
*/
public int getIndex(String value);
public ImmutableBitmap getBitmap(int idx);
}
另外还有一个比较重要的数据结构GenericIndexed:其说明如下:
/**
* A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input
* is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.
*
* V1 Storage Format:
*
* byte 1: version (0x1)
* byte 2 == 0x1 => allowReverseLookup
* bytes 3-6 => numBytesUsed
* bytes 7-10 => numElements
* bytes 10-((numElements * 4) + 10): integers representing *end* offsets of byte serialized values
* bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value
*/
下面是一个LongColumn的实现:
public class LongColumn extends AbstractColumn
{
private static final ColumnCapabilitiesImpl CAPABILITIES = new ColumnCapabilitiesImpl()
.setType(ValueType.LONG);
private final CompressedLongsIndexedSupplier column;
public LongColumn(CompressedLongsIndexedSupplier column)
{
this.column = column;
}
@Override
public ColumnCapabilities getCapabilities()
{
return CAPABILITIES;
}
@Override
public int getLength()
{
return column.size();
}
@Override
public GenericColumn getGenericColumn()
{
return new IndexedLongsGenericColumn(column.get());
}
}
我们可以看到,LongColumn其实是一个CompressedLongsIndexedSupplier的包装类。CompressedLongsIndexedSupplier类提供了LongColumn中操作的真正实现:
public class CompressedLongsIndexedSupplier implements Supplier<IndexedLongs>
{
public static final byte LZF_VERSION = 0x1;
public static final byte version = 0x2;
public static final int MAX_LONGS_IN_BUFFER = CompressedPools.BUFFER_SIZE / Longs.BYTES;
private final int totalSize;
private final int sizePer;
private final GenericIndexed<ResourceHolder<LongBuffer>> baseLongBuffers;
private final CompressedObjectStrategy.CompressionStrategy compression;
CompressedLongsIndexedSupplier(
int totalSize,
int sizePer,
GenericIndexed<ResourceHolder<LongBuffer>> baseLongBuffers,
CompressedObjectStrategy.CompressionStrategy compression
)
{
this.totalSize = totalSize;
this.sizePer = sizePer;
this.baseLongBuffers = baseLongBuffers;
this.compression = compression;
}
public int size()
{
return totalSize;
}
@Override
public IndexedLongs get()
{
final int div = Integer.numberOfTrailingZeros(sizePer);
final int rem = sizePer - 1;
final boolean powerOf2 = sizePer == (1 << div);
if(powerOf2) {
return new CompressedIndexedLongs() {
@Override
public long get(int index)
{
// optimize division and remainder for powers of 2
final int bufferNum = index >> div;
if (bufferNum != currIndex) {
loadBuffer(bufferNum);
}
final int bufferIndex = index & rem;
return buffer.get(buffer.position() + bufferIndex);
}
};
} else {
return new CompressedIndexedLongs();
}
}
public long getSerializedSize()
{
return baseLongBuffers.getSerializedSize() + 1 + 4 + 4 + 1;
}
public void writeToChannel(WritableByteChannel channel) throws IOException
{
channel.write(ByteBuffer.wrap(new byte[]{version}));
channel.write(ByteBuffer.wrap(Ints.toByteArray(totalSize)));
channel.write(ByteBuffer.wrap(Ints.toByteArray(sizePer)));
channel.write(ByteBuffer.wrap(new byte[]{compression.getId()}));
baseLongBuffers.writeToChannel(channel);
}
public CompressedLongsIndexedSupplier convertByteOrder(ByteOrder order)
{
return new CompressedLongsIndexedSupplier(
totalSize,
sizePer,
GenericIndexed.fromIterable(baseLongBuffers, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)),
compression
);
}
/**
* For testing. Do not use unless you like things breaking
*/
GenericIndexed<ResourceHolder<LongBuffer>> getBaseLongBuffers()
{
return baseLongBuffers;
}
public static CompressedLongsIndexedSupplier fromByteBuffer(ByteBuffer buffer, ByteOrder order)
{
byte versionFromBuffer = buffer.get();
if (versionFromBuffer == version) {
final int totalSize = buffer.getInt();
final int sizePer = buffer.getInt();
final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.forId(buffer.get());
return new CompressedLongsIndexedSupplier(
totalSize,
sizePer,
GenericIndexed.read(buffer, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)),
compression
);
} else if (versionFromBuffer == LZF_VERSION) {
final int totalSize = buffer.getInt();
final int sizePer = buffer.getInt();
final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.LZF;
return new CompressedLongsIndexedSupplier(
totalSize,
sizePer,
GenericIndexed.read(buffer, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)),
compression
);
}
throw new IAE("Unknown version[%s]", versionFromBuffer);
}
public static CompressedLongsIndexedSupplier fromLongBuffer(LongBuffer buffer, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression)
{
return fromLongBuffer(buffer, MAX_LONGS_IN_BUFFER, byteOrder, compression);
}
public static CompressedLongsIndexedSupplier fromLongBuffer(
final LongBuffer buffer, final int chunkFactor, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression
)
{
Preconditions.checkArgument(
chunkFactor <= MAX_LONGS_IN_BUFFER, "Chunks must be <= 64k bytes. chunkFactor was[%s]", chunkFactor
);
return new CompressedLongsIndexedSupplier(
buffer.remaining(),
chunkFactor,
GenericIndexed.fromIterable(
new Iterable<ResourceHolder<LongBuffer>>()
{
@Override
public Iterator<ResourceHolder<LongBuffer>> iterator()
{
return new Iterator<ResourceHolder<LongBuffer>>()
{
LongBuffer myBuffer = buffer.asReadOnlyBuffer();
@Override
public boolean hasNext()
{
return myBuffer.hasRemaining();
}
@Override
public ResourceHolder<LongBuffer> next()
{
LongBuffer retVal = myBuffer.asReadOnlyBuffer();
if (chunkFactor < myBuffer.remaining()) {
retVal.limit(retVal.position() + chunkFactor);
}
myBuffer.position(myBuffer.position() + retVal.remaining());
return StupidResourceHolder.create(retVal);
}
@Override
public void remove()
{
throw new UnsupportedOperationException();
}
};
}
},
CompressedLongBufferObjectStrategy.getBufferForOrder(byteOrder, compression, chunkFactor)
),
compression
);
}
public static CompressedLongsIndexedSupplier fromList(
final List<Long> list , final int chunkFactor, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression
)
{
Preconditions.checkArgument(
chunkFactor <= MAX_LONGS_IN_BUFFER, "Chunks must be <= 64k bytes. chunkFactor was[%s]", chunkFactor
);
return new CompressedLongsIndexedSupplier(
list.size(),
chunkFactor,
GenericIndexed.fromIterable(
new Iterable<ResourceHolder<LongBuffer>>()
{
@Override
public Iterator<ResourceHolder<LongBuffer>> iterator()
{
return new Iterator<ResourceHolder<LongBuffer>>()
{
int position = 0;
@Override
public boolean hasNext()
{
return position < list.size();
}
@Override
public ResourceHolder<LongBuffer> next()
{
LongBuffer retVal = LongBuffer.allocate(chunkFactor);
if (chunkFactor > list.size() - position) {
retVal.limit(list.size() - position);
}
final List<Long> longs = list.subList(position, position + retVal.remaining());
for (long value : longs) {
retVal.put(value);
}
retVal.rewind();
position += retVal.remaining();
return StupidResourceHolder.create(retVal);
}
@Override
public void remove()
{
throw new UnsupportedOperationException();
}
};
}
},
CompressedLongBufferObjectStrategy.getBufferForOrder(byteOrder, compression, chunkFactor)
),
compression
);
}
private class CompressedIndexedLongs implements IndexedLongs
{
final Indexed<ResourceHolder<LongBuffer>> singleThreadedLongBuffers = baseLongBuffers.singleThreaded();
int currIndex = -1;
ResourceHolder<LongBuffer> holder;
LongBuffer buffer;
@Override
public int size()
{
return totalSize;
}
@Override
public long get(int index)
{
final int bufferNum = index / sizePer;
final int bufferIndex = index % sizePer;
if (bufferNum != currIndex) {
loadBuffer(bufferNum);
}
return buffer.get(buffer.position() + bufferIndex);
}
@Override
public void fill(int index, long[] toFill)
{
if (totalSize - index < toFill.length) {
throw new IndexOutOfBoundsException(
String.format(
"Cannot fill array of size[%,d] at index[%,d]. Max size[%,d]", toFill.length, index, totalSize
)
);
}
int bufferNum = index / sizePer;
int bufferIndex = index % sizePer;
int leftToFill = toFill.length;
while (leftToFill > 0) {
if (bufferNum != currIndex) {
loadBuffer(bufferNum);
}
buffer.mark();
buffer.position(buffer.position() + bufferIndex);
final int numToGet = Math.min(buffer.remaining(), leftToFill);
buffer.get(toFill, toFill.length - leftToFill, numToGet);
buffer.reset();
leftToFill -= numToGet;
++bufferNum;
bufferIndex = 0;
}
}
protected void loadBuffer(int bufferNum)
{
CloseQuietly.close(holder);
holder = singleThreadedLongBuffers.get(bufferNum);
buffer = holder.get();
currIndex = bufferNum;
}
@Override
public int binarySearch(long key)
{
throw new UnsupportedOperationException();
}
@Override
public int binarySearch(long key, int from, int to)
{
throw new UnsupportedOperationException();
}
@Override
public String toString()
{
return "CompressedLongsIndexedSupplier_Anonymous{" +
"currIndex=" + currIndex +
", sizePer=" + sizePer +
", numChunks=" + singleThreadedLongBuffers.size() +
", totalSize=" + totalSize +
'}';
}
@Override
public void close() throws IOException
{
Closeables.close(holder, false);
}
}
}