目录
前言
主要是对druid.io index_hadoop的两个MR任务进行源码分析,可以结合下任意一个 index_hadoop 任务的log 来配合理解
index_hadoop 任务配置和说明
源码实现上很多实现类相关都是根据任务配置来的,当然配置一般是有默认的
查看druid的index_hadoop任务log,可以发现是有两个job: 第一个是io.druid.indexer.DetermineHashedPartitionsJob
,第二个是io.druid.indexer.IndexGeneratorJob
;显然要弄清楚index_hadoop任务是怎么运行的,需要去分析下这两个job ?
DetermineHashedPartitionsJob
/**
* Determines appropriate ShardSpecs for a job by determining approximate cardinality of data set using HyperLogLog
*/
public class DetermineHashedPartitionsJob implements Jobby
使用HyperLogLog
来估算数据集的基数,以此决定一个合适的分片数(Druid.io segment的数量)。
run方法
@Override
public boolean run()
{
try {
/*
* Group by (timestamp, dimensions) so we can correctly count dimension values as they would appear
* in the final segment.
*/
final long startTime = System.currentTimeMillis();
groupByJob = Job.getInstance(
new Configuration(),
StringUtils.format("%s-determine_partitions_hashed-%s", config.getDataSource(), config.getIntervals())
);
Map 操作
MapperClass是public static class DetermineCardinalityMapper extends HadoopDruidIndexerMapper<LongWritable, BytesWritable>
也是继承了HadoopDruidIndexerMapper
,innerMap
方法如下
首先获取原始数据在HadoopDruidIndexerMapper
的map方法中,真正的处理在DetermineCardinalityMapper
的innerMap方法中
- HadoopDruidIndexerMapper 对原始数据解析成
List<InputRow>
,并做一些数据判断、过滤等操作
@Override
protected void map(Object key, Object value, Context context) throws IOException, InterruptedException
{
try {
final List<InputRow> inputRows = parseInputRow(value, parser);
for (InputRow inputRow : inputRows) {
try {
if (inputRow == null) {
// Throw away null rows from the parser.
log.debug("Throwing away row [%s]", value);
context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
continue;
}
if (!Intervals.ETERNITY.contains(inputRow.getTimestamp())) {
final String errorMsg = StringUtils.format(
"Encountered row with timestamp that cannot be represented as a long: [%s]",
inputRow
);
throw new ParseException(errorMsg);
}
if (!granularitySpec.bucketIntervals().isPresent()
|| granularitySpec.bucketInterval(DateTimes.utc(inputRow.getTimestampFromEpoch()))
.isPresent()) {
innerMap(inputRow, context, reportParseExceptions);
} else {
context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
}
}
catch (ParseException pe) {
handleParseException(pe, context);
}
}
}
catch (ParseException pe) {
handleParseException(pe, context);
}
catch (RuntimeException e) {
throw new RE(e, "Failure on row[%s]", value);
}
}
InputRow
@Override
protected void innerMap(
InputRow inputRow,
Context context,
boolean reportParseExceptions
) throws IOException, InterruptedException
{
final List<Object> groupKey = Rows.toGroupKey(
rollupGranularity.bucketStart(inputRow.getTimestamp()).getMillis(),
inputRow
);
Interval interval;
if (determineIntervals) {
interval = config.getGranularitySpec()
.getSegmentGranularity()
.bucket(DateTimes.utc(inputRow.getTimestampFromEpoch()));
if (!hyperLogLogs.containsKey(interval)) {
hyperLogLogs.put(interval, HyperLogLogCollector.makeLatestCollector());
}
} else {
final Optional<Interval> maybeInterval = config.getGranularitySpec()
.bucketInterval(DateTimes.utc(inputRow.getTimestampFromEpoch()));
if (!maybeInterval.isPresent()) {
throw new ISE("WTF?! No bucket found for timestamp: %s", inputRow.getTimestampFromEpoch());
}
interval = maybeInterval.get();
}
hyperLogLogs
.get(interval)
.add(hashFunction.hashBytes(HadoopDruidIndexerConfig.JSON_MAPPER.writeValueAsBytes(groupKey)).asBytes());
context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_PROCESSED_COUNTER).increment(1);
}
- 使用
private Map<Interval, HyperLogLogCollector> hyperLogLogs;
统计,按照SegmentGranularity
的Interval作为该Map的key,groupKey
添加到HyperLogLogCollector groupKey
依然是将时间戳按照查询粒度QueryGranularity
进行规整,与维度列(hashedDimensions
)一起构成groupKey- Mapper的输出就是遍历
hyperLogLogs
写入key,value
Rows.toGroupKey
/**
* @param timeStamp rollup up timestamp to be used to create group key
* @param inputRow input row
*
* @return groupKey for the given input row
*/
public static List<Object> toGroupKey(long timeStamp, InputRow inputRow)
{
final Map<String, Set<String>> dims = Maps.newTreeMap();
for (final String dim : inputRow.getDimensions()) {
// 得到维度值的set集合,然后 dimValues.size() 就是该维度的基数了
final Set<String> dimValues = ImmutableSortedSet.copyOf(inputRow.getDimension(dim));
if (dimValues.size() > 0) {
dims.put(dim, dimValues);
}
}
return ImmutableList.of(
timeStamp,
dims
);
}
reduce 操作
reducerClass: public static class DetermineCardinalityReducer extends Reducer<LongWritable, BytesWritable, NullWritable, NullWritable> {
@Override
protected void reduce(
LongWritable key,
Iterable<BytesWritable> values,
Context context
) throws IOException, InterruptedException
{
HyperLogLogCollector aggregate = HyperLogLogCollector.makeLatestCollector();
for (BytesWritable value : values) {
aggregate.fold(
HyperLogLogCollector.makeCollector(ByteBuffer.wrap(value.getBytes(), 0, value.getLength()))
);
}
Interval interval;
if (determineIntervals) {
interval = config.getGranularitySpec().getSegmentGranularity().bucket(DateTimes.utc(key.get()));
} else {
Optional<Interval> intervalOptional = config.getGranularitySpec().bucketInterval(DateTimes.utc(key.get()));
if (!intervalOptional.isPresent()) {
throw new ISE("WTF?! No bucket found for timestamp: %s", key.get());
}
interval = intervalOptional.get();
}
intervals.add(interval);
final Path outPath = config.makeSegmentPartitionInfoPath(interval);
final OutputStream out = Utils.makePathAndOutputStream(
context, outPath, config.isOverwriteFiles()
);
try {
HadoopDruidIndexerConfig.JSON_MAPPER.writerWithType(
new TypeReference<Long>()
{
}
).writeValue(
out,
aggregate.estimateCardinalityRound()
);
}
finally {
Closeables.close(out, false);
}
}
对mapper的输出进行汇总,即能知道每个segment的总数据记录数量
DetermineHashedPartitionsJob完成
DetermineHashedPartitionsJob 会log一些信息
- 每个DetermineHashedPartitionsJob会有预估的rows,以及配合配置的
targetPartitionSize
后可能的shard数量 - 最后conf会重新设置
shardSpecs
参数
实际例子log
eg:(聚合后实际有55,342,482
行,聚合前57238510
,维度很多,聚合特别差)
2020-04-26T11:45:11,430 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Job completed, loading up partitions for intervals[Optional.of([2020-04-25T00:00:00.000+08:00/2020-04-26T00:00:00.000+08:00])].
2020-04-26T11:45:11,468 INFO [task-runner-0-priority-0] org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.deflate]
2020-04-26T11:45:11,596 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Found approximately [54,135,187] rows in data.
2020-04-26T11:45:11,596 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Creating [3] shards
2020-04-26T11:45:11,597 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-25T00:00:00.000+08:00], partition[0], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=0, partitions=3, partitionDimensions=[]}, shardNum=0}]
2020-04-26T11:45:11,598 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-25T00:00:00.000+08:00], partition[1], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=1, partitions=3, partitionDimensions=[]}, shardNum=1}]
2020-04-26T11:45:11,598 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-25T00:00:00.000+08:00], partition[2], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=2, partitions=3, partitionDimensions=[]}, shardNum=2}]
2020-04-26T11:45:11,598 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DetermineHashedPartitionsJob took 325209 millis
- 当时的配置(这解释为什么3个分片)
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize" : 20000000,
"maxPartitionSize" : 30000000,
"assumeGrouped" : false,
"numShards" : -1,
"partitionDimensions" : [ ]
},
实际例子log2
eg:(聚合后实际有367,313,712
行,聚合前359,930,085
,维度也是很多,聚合特别差,且有一些去重指标),且由于数据量大,19个segment平均也是1.7G,1天是33G的segment总大小
2020-04-27T19:57:23,781 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Job completed, loading up partitions for intervals[Optional.of([2020-04-26T00:00:00.000+08:00/2020-04-27T00:00:00.000+08:00])].
2020-04-27T19:57:23,823 INFO [task-runner-0-priority-0] org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.deflate]
2020-04-27T19:57:24,149 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Found approximately [367,313,712] rows in data.
2020-04-27T19:57:24,150 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Creating [19] shards
2020-04-27T19:57:24,151 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-26T00:00:00.000+08:00], partition[0], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=0, partitions=19, partitionDimensions=[]}, shardNum=0}]
源码
log.info("Job completed, loading up partitions for intervals[%s].", config.getSegmentGranularIntervals());
FileSystem fileSystem = null;
if (!config.getSegmentGranularIntervals().isPresent()) {
final Path intervalInfoPath = config.makeIntervalInfoPath();
fileSystem = intervalInfoPath.getFileSystem(groupByJob.getConfiguration());
if (!Utils.exists(groupByJob, fileSystem, intervalInfoPath)) {
throw new ISE("Path[%s] didn't exist!?", intervalInfoPath);
}
List<Interval> intervals = config.JSON_MAPPER.readValue(
Utils.openInputStream(groupByJob, intervalInfoPath),
new TypeReference<List<Interval>>() {}
);
config.setGranularitySpec(
new UniformGranularitySpec(
config.getGranularitySpec().getSegmentGranularity(),
config.getGranularitySpec().getQueryGranularity(),
config.getGranularitySpec().isRollup(),
intervals
)
);
log.info("Determined Intervals for Job [%s].", config.getSegmentGranularIntervals());
}
Map<Long, List<HadoopyShardSpec>> shardSpecs = Maps.newTreeMap(DateTimeComparator.getInstance());
int shardCount = 0;
for (Interval segmentGranularity : config.getSegmentGranularIntervals().get()) {
DateTime bucket = segmentGranularity.getStart();
final Path partitionInfoPath = config.makeSegmentPartitionInfoPath(segmentGranularity);
if (fileSystem == null) {
fileSystem = partitionInfoPath.getFileSystem(groupByJob.getConfiguration());
}
if (Utils.exists(groupByJob, fileSystem, partitionInfoPath)) {
final Long numRows = config.JSON_MAPPER.readValue(
Utils.openInputStream(groupByJob, partitionInfoPath),
new TypeReference<Long>() {}
);
log.info("Found approximately [%,d] rows in data.", numRows);
final int numberOfShards = (int) Math.ceil((double) numRows / config.getTargetPartitionSize());
log.info("Creating [%,d] shards", numberOfShards);
List<HadoopyShardSpec> actualSpecs = Lists.newArrayListWithExpectedSize(numberOfShards);
if (numberOfShards == 1) {
actualSpecs.add(new HadoopyShardSpec(NoneShardSpec.instance(), shardCount++));
} else {
for (int i = 0; i < numberOfShards; ++i) {
actualSpecs.add(
new HadoopyShardSpec(
new HashBasedNumberedShardSpec(
i,
numberOfShards,
null,
HadoopDruidIndexerConfig.JSON_MAPPER
),
shardCount++
)
);
log.info("DateTime[%s], partition[%d], spec[%s]", bucket, i, actualSpecs.get(i));
}
}
shardSpecs.put(bucket.getMillis(), actualSpecs);
} else {
log.info("Path[%s] didn't exist!?", partitionInfoPath);
}
}
config.setShardSpecs(shardSpecs);
log.info(
"DetermineHashedPartitionsJob took %d millis",
(System.currentTimeMillis() - startTime)
);
IndexGeneratorJob 提交
public class IndexGeneratorJob implements Jobby
@Override
public boolean run()
{
try {
job = Job.getInstance(
new Configuration(),
StringUtils.format("%s-index-generator-%s", config.getDataSource(), config.getIntervals())
);
job.getConfiguration().set("io.sort.record.percent", "0.23");
JobHelper.injectSystemProperties(job);
config.addJobProperties(job);
// inject druid properties like deep storage bindings
JobHelper.injectDruidProperties(job.getConfiguration(), config.getAllowedHadoopPrefix());
job.setMapperClass(IndexGeneratorMapper.class);
job.setMapOutputValueClass(BytesWritable.class);
SortableBytes.useSortableBytesAsMapOutputKey(job);
int numReducers = Iterables.size(config.getAllBuckets().get());
if (numReducers == 0) {
throw new RuntimeException("No buckets?? seems there is no data to index.");
}
if (config.getSchema().getTuningConfig().getUseCombiner()) {
job.setCombinerClass(IndexGeneratorCombiner.class);
job.setCombinerKeyGroupingComparatorClass(BytesWritable.Comparator.class);
}
job.setNumReduceTasks(numReducers);
job.setPartitionerClass(IndexGeneratorPartitioner.class);
setReducerClass(job);
job.setOutputKeyClass(BytesWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(IndexGeneratorOutputFormat.class);
FileOutputFormat.setOutputPath(job, config.makeIntermediatePath());
config.addInputPaths(job);
config.intoConfiguration(job);
JobHelper.setupClasspath(
JobHelper.distributedClassPath(config.getWorkingPath()),
JobHelper.distributedClassPath(config.makeIntermediatePath()),
job
);
job.submit();
log.info("Job %s submitted, status available at %s", job.getJobName(), job.getTrackingURL());
boolean success = job.waitForCompletion(true);
Counters counters = job.getCounters();
if (counters == null) {
log.info("No counters found for job [%s]", job.getJobName());
} else {
Counter invalidRowCount = counters.findCounter(HadoopDruidIndexerConfig.IndexJobCounters.INVALID_ROW_COUNTER);
if (invalidRowCount != null) {
jobStats.setInvalidRowCount(invalidRowCount.getValue());
} else {
log.info("No invalid row counter found for job [%s]", job.getJobName());
}
}
return success;
}
catch (Exception e) {
throw new RuntimeException(e);
}
}
- injectSystemProperties
- injectDruidProperties
- numReducers(经典的
No buckets?? seems there is no data to index.
) - 是否开启combiner
if (config.getSchema().getTuningConfig().getUseCombiner()) {
job.setCombinerClass(IndexGeneratorCombiner.class);
job.setCombinerKeyGroupingComparatorClass(BytesWritable.Comparator.class);
}
- 任务log查看地址:
log.info("Job %s submitted, status available at %s", job.getJobName(), job.getTrackingURL());
- Job输入就是任务配置的
inputSpec
属性 - Job输出
FileOutputFormat.setOutputPath(job, config.makeIntermediatePath());
public Path makeIntermediatePath()
{
return new Path(
StringUtils.format(
"%s/%s/%s_%s",
getWorkingPath(),
schema.getDataSchema().getDataSource(),
schema.getTuningConfig().getVersion().replace(":", ""),
schema.getUniqueId()
)
);
}
private static final String DEFAULT_WORKING_PATH = "/tmp/druid-indexing";
public String getWorkingPath()
{
final String workingPath = schema.getTuningConfig().getWorkingPath();
return workingPath == null ? DEFAULT_WORKING_PATH : workingPath;
}
- MapperClass:
job.setMapperClass(IndexGeneratorMapper.class);
- ReducerClass:
setReducerClass(job);
即:job.setReducerClass(IndexGeneratorReducer.class);
map
public abstract class HadoopDruidIndexerMapper<KEYOUT, VALUEOUT> extends Mapper<Object, Object, KEYOUT, VALUEOUT>
@Override
protected void map(Object key, Object value, Context context) throws IOException, InterruptedException
{
try {
// 获取原始数据 并转换为Druid的 InputRow 对象
final List<InputRow> inputRows = parseInputRow(value, parser);
for (InputRow inputRow : inputRows) {
try {
// 抛弃 null 的 InputRow
if (inputRow == null) {
// Throw away null rows from the parser.
log.debug("Throwing away row [%s]", value);
context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
continue;
}
// 时间戳不合法的,抛出异常
if (!Intervals.ETERNITY.contains(inputRow.getTimestamp())) {
final String errorMsg = StringUtils.format(
"Encountered row with timestamp that cannot be represented as a long: [%s]",
inputRow
);
throw new ParseException(errorMsg);
}
// 任务有效时间范围内的进行innerMap操作,否则抛弃掉并打点记录
if (!granularitySpec.bucketIntervals().isPresent()
|| granularitySpec.bucketInterval(DateTimes.utc(inputRow.getTimestampFromEpoch()))
.isPresent()) {
innerMap(inputRow, context, reportParseExceptions);
} else {
context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
}
}
catch (ParseException pe) {
handleParseException(pe, context);
}
}
}
catch (ParseException pe) {
handleParseException(pe, context);
}
catch (RuntimeException e) {
throw new RE(e, "Failure on row[%s]", value);
}
}
public static class IndexGeneratorMapper extends HadoopDruidIndexerMapper<BytesWritable, BytesWritable>
{
@Override
protected void innerMap(
InputRow inputRow,
Context context,
boolean reportParseExceptions
) throws IOException, InterruptedException
{
// 根据 任务配置的:granularitySpec 来分桶
// Group by bucket, sort by timestamp
final Optional<Bucket> bucket = getConfig().getBucket(inputRow);
// 无数据情况
if (!bucket.isPresent()) {
throw new ISE("WTF?! No bucket found for row: %s", inputRow);
}
// rollup up timestamp,分组
final long truncatedTimestamp = granularitySpec.getQueryGranularity().bucketStart(inputRow.getTimestamp()).getMillis();
final byte[] hashedDimensions = hashFunction.hashBytes(
HadoopDruidIndexerConfig.JSON_MAPPER.writeValueAsBytes(
Rows.toGroupKey(
truncatedTimestamp,
inputRow
)
)
).asBytes();
// 重要的对象:InputRowSerde, 序列化,toBytes
// type SegmentInputRow serves as a marker that these InputRow instances have already been combined
// and they contain the columns as they show up in the segment after ingestion, not what you would see in raw
// data
InputRowSerde.SerializeResult serializeResult = inputRow instanceof SegmentInputRow ?
InputRowSerde.toBytes(
typeHelperMap,
inputRow,
aggsForSerializingSegmentInputRow
)
:
InputRowSerde.toBytes(
typeHelperMap,
inputRow,
aggregators
);
// mapper的输出
context.write(
// K
new SortableBytes(
bucket.get().toGroupKey(),
// sort rows by truncated timestamp and hashed dimensions to help reduce spilling on the reducer side
ByteBuffer.allocate(Longs.BYTES + hashedDimensions.length)
.putLong(truncatedTimestamp)
.put(hashedDimensions)
.array()
).toBytesWritable(),
// V
new BytesWritable(serializeResult.getSerializedRow())
);
ParseException pe = IncrementalIndex.getCombinedParseException(
inputRow,
serializeResult.getParseExceptionMessages(),
null
);
if (pe != null) {
throw pe;
} else {
context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_PROCESSED_COUNTER).increment(1);
}
}
}
Map 具体说明
- MapperClass
IndexGeneratorMapper
继承了HadoopDruidIndexerMapper
,HadoopDruidIndexerMapper
会对原始数据进行解析和过滤成符合要求的InputRow
然后让IndexGeneratorMapper
重写的innerMap
进一步操作;抛弃的数据会被HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER
计数 - innerMap方法得到每一个
InputRow
,第一步就是按照InputRow
的时间进行分桶操作(是将时间戳按照查询粒度QueryGranularity
进行规整,与维度列(hashedDimensions
)一起构成groupKey,groupKey 作为一个count计数HadoopDruidIndexerConfig.IndexJobCounters.ROWS_PROCESSED_COUNTER
)
常见parser如下,则使用io.druid.java.util.common.parsers.JSONPathParser
对原始的json数据进行提取,最后也是转化为InputRow
对象
"parser" : {
"type" : "hadoopyString",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
"dimensionExclusions" : [],
"spatialDimensions" : []
}
}
},
Combiner
public static class IndexGeneratorCombiner extends Reducer<BytesWritable, BytesWritable, BytesWritable, BytesWritable>
{
@Override
protected void reduce(
final BytesWritable key, Iterable<BytesWritable> values, final Context context
) throws IOException, InterruptedException
{
// mapper 输出的 BytesWritable 数据
Iterator<BytesWritable> iter = values.iterator();
BytesWritable first = iter.next();
if (iter.hasNext()) {
LinkedHashSet<String> dimOrder = Sets.newLinkedHashSet();
SortableBytes keyBytes = SortableBytes.fromBytesWritable(key);
Bucket bucket = Bucket.fromGroupKey(keyBytes.getGroupKey()).lhs;
// 索引文件:也是先内存中,然后超过配置行数或时间会持久化成文件
IncrementalIndex index = makeIncrementalIndex(bucket, combiningAggs, config, null, null);
index.add(InputRowSerde.fromBytes(typeHelperMap, first.getBytes(), aggregators));
while (iter.hasNext()) {
context.progress();
InputRow value = InputRowSerde.fromBytes(typeHelperMap, iter.next().getBytes(), aggregators);
if (!index.canAppendRow()) {
dimOrder.addAll(index.getDimensionOrder());
log.info("current index full due to [%s]. creating new index.", index.getOutOfRowsReason());
flushIndexToContextAndClose(key, index, context);
index = makeIncrementalIndex(bucket, combiningAggs, config, dimOrder, index.getColumnCapabilities());
}
index.add(value);
}
// 最后flush和输出
flushIndexToContextAndClose(key, index, context);
} else {
context.write(key, first);
}
}
private void flushIndexToContextAndClose(BytesWritable key, IncrementalIndex index, Context context)
throws IOException, InterruptedException
{
final List<String> dimensions = index.getDimensionNames();
Iterator<Row> rows = index.iterator();
while (rows.hasNext()) {
context.progress();
Row row = rows.next();
InputRow inputRow = getInputRowFromRow(row, dimensions);
// reportParseExceptions is true as any unparseable data is already handled by the mapper.
InputRowSerde.SerializeResult serializeResult = InputRowSerde.toBytes(typeHelperMap, inputRow, combiningAggs);
context.write(
key,
new BytesWritable(serializeResult.getSerializedRow())
);
}
index.close();
}
// 聚合指标根据配置是schema获取到,对应有聚合函数
aggregators = config.getSchema().getDataSchema().getAggregators();
combiningAggs = new AggregatorFactory[aggregators.length];
for (int i = 0; i < aggregators.length; ++i) {
combiningAggs[i] = aggregators[i].getCombiningFactory();
}
// 不同维度数据类型处理
typeHelperMap = InputRowSerde.getTypeHelperMap(config.getSchema()
.getDataSchema()
.getParser()
.getParseSpec()
.getDimensionsSpec());
// InputRowSerde.toBytes 会处理InputRow: 时间维度,普通维度,聚合指标依次会处理
public static final SerializeResult toBytes(
final Map<String, IndexSerdeTypeHelper> typeHelperMap,
final InputRow row,
AggregatorFactory[] aggs
)
{
try {
List<String> parseExceptionMessages = new ArrayList<>();
ByteArrayDataOutput out = ByteStreams.newDataOutput();
//write timestamp
out.writeLong(row.getTimestampFromEpoch());
//writing all dimensions
List<String> dimList = row.getDimensions();
WritableUtils.writeVInt(out, dimList.size());
if (dimList != null) {
for (String dim : dimList) {
IndexSerdeTypeHelper typeHelper = typeHelperMap.get(dim);
if (typeHelper == null) {
typeHelper = STRING_HELPER;
}
writeString(dim, out);
try {
typeHelper.serialize(out, row.getRaw(dim));
}
catch (ParseException pe) {
parseExceptionMessages.add(pe.getMessage());
}
}
}
//writing all metrics
Supplier<InputRow> supplier = new Supplier<InputRow>()
{
@Override
public InputRow get()
{
return row;
}
};
WritableUtils.writeVInt(out, aggs.length);
for (AggregatorFactory aggFactory : aggs) {
String k = aggFactory.getName();
writeString(k, out);
try (Aggregator agg = aggFactory.factorize(
IncrementalIndex.makeColumnSelectorFactory(
VirtualColumns.EMPTY,
aggFactory,
supplier,
true
)
)) {
try {
agg.aggregate();
}
catch (ParseException e) {
// "aggregate" can throw ParseExceptions if a selector expects something but gets something else.
log.debug(e, "Encountered parse error, skipping aggregator[%s].", k);
parseExceptionMessages.add(e.getMessage());
}
String t = aggFactory.getTypeName();
if (t.equals("float")) {
out.writeFloat(agg.getFloat());
} else if (t.equals("long")) {
WritableUtils.writeVLong(out, agg.getLong());
} else if (t.equals("double")) {
out.writeDouble(agg.getDouble());
} else {
//its a complex metric
Object val = agg.get();
ComplexMetricSerde serde = getComplexMetricSerde(t);
writeBytes(serde.toBytes(val), out);
}
}
}
return new SerializeResult(out.toByteArray(), parseExceptionMessages);
}
catch (IOException ex) {
throw new RuntimeException(ex);
}
}
reduce操作
public static class IndexGeneratorReducer extends Reducer<BytesWritable, BytesWritable, BytesWritable, Text>
{
@Override
protected void reduce(
BytesWritable key, Iterable<BytesWritable> values, final Context context
) throws IOException, InterruptedException
{
SortableBytes keyBytes = SortableBytes.fromBytesWritable(key);
Bucket bucket = Bucket.fromGroupKey(keyBytes.getGroupKey()).lhs;
final Interval interval = config.getGranularitySpec().bucketInterval(bucket.time).get();
ListeningExecutorService persistExecutor = null;
List<ListenableFuture<?>> persistFutures = Lists.newArrayList();
IncrementalIndex index = makeIncrementalIndex(
bucket,
combiningAggs,
config,
null,
null
);
try {
File baseFlushFile = File.createTempFile("base", "flush");
baseFlushFile.delete();
baseFlushFile.mkdirs();
Set<File> toMerge = Sets.newTreeSet();
int indexCount = 0;
int lineCount = 0;
int runningTotalLineCount = 0;
long startTime = System.currentTimeMillis();
Set<String> allDimensionNames = Sets.newLinkedHashSet();
final ProgressIndicator progressIndicator = makeProgressIndicator(context);
int numBackgroundPersistThreads = config.getSchema().getTuningConfig().getNumBackgroundPersistThreads();
if (numBackgroundPersistThreads > 0) {
final BlockingQueue<Runnable> queue = new SynchronousQueue<>();
ExecutorService executorService = new ThreadPoolExecutor(
numBackgroundPersistThreads,
numBackgroundPersistThreads,
0L,
TimeUnit.MILLISECONDS,
queue,
Execs.makeThreadFactory("IndexGeneratorJob_persist_%d"),
new RejectedExecutionHandler()
{
@Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor)
{
try {
executor.getQueue().put(r);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RejectedExecutionException("Got Interrupted while adding to the Queue", e);
}
}
}
);
persistExecutor = MoreExecutors.listeningDecorator(executorService);
} else {
persistExecutor = MoreExecutors.sameThreadExecutor();
}
for (final BytesWritable bw : values) {
context.progress();
final InputRow inputRow = index.formatRow(InputRowSerde.fromBytes(typeHelperMap, bw.getBytes(), aggregators));
int numRows = index.add(inputRow).getRowCount();
++lineCount;
if (!index.canAppendRow()) {
allDimensionNames.addAll(index.getDimensionOrder());
log.info(index.getOutOfRowsReason());
log.info(
"%,d lines to %,d rows in %,d millis",
lineCount - runningTotalLineCount,
numRows,
System.currentTimeMillis() - startTime
);
runningTotalLineCount = lineCount;
final File file = new File(baseFlushFile, StringUtils.format("index%,05d", indexCount));
toMerge.add(file);
context.progress();
final IncrementalIndex persistIndex = index;
persistFutures.add(
persistExecutor.submit(
new ThreadRenamingRunnable(StringUtils.format("%s-persist", file.getName()))
{
@Override
public void doRun()
{
try {
persist(persistIndex, interval, file, progressIndicator);
}
catch (Exception e) {
log.error(e, "persist index error");
throw Throwables.propagate(e);
}
finally {
// close this index
persistIndex.close();
}
}
}
)
);
index = makeIncrementalIndex(
bucket,
combiningAggs,
config,
allDimensionNames,
persistIndex.getColumnCapabilities()
);
startTime = System.currentTimeMillis();
++indexCount;
}
}
allDimensionNames.addAll(index.getDimensionOrder());
log.info("%,d lines completed.", lineCount);
List<QueryableIndex> indexes = Lists.newArrayListWithCapacity(indexCount);
final File mergedBase;
if (toMerge.size() == 0) {
if (index.isEmpty()) {
throw new IAE("If you try to persist empty indexes you are going to have a bad time");
}
mergedBase = new File(baseFlushFile, "merged");
persist(index, interval, mergedBase, progressIndicator);
} else {
if (!index.isEmpty()) {
final File finalFile = new File(baseFlushFile, "final");
persist(index, interval, finalFile, progressIndicator);
toMerge.add(finalFile);
}
Futures.allAsList(persistFutures).get(1, TimeUnit.HOURS);
persistExecutor.shutdown();
for (File file : toMerge) {
indexes.add(HadoopDruidIndexerConfig.INDEX_IO.loadIndex(file));
}
log.info("starting merge of intermediate persisted segments.");
long mergeStartTime = System.currentTimeMillis();
mergedBase = mergeQueryableIndex(
indexes, aggregators, new File(baseFlushFile, "merged"), progressIndicator
);
log.info(
"finished merge of intermediate persisted segments. time taken [%d] ms.",
(System.currentTimeMillis() - mergeStartTime)
);
}
final FileSystem outputFS = new Path(config.getSchema().getIOConfig().getSegmentOutputPath())
.getFileSystem(context.getConfiguration());
// ShardSpec used for partitioning within this Hadoop job.
final ShardSpec shardSpecForPartitioning = config.getShardSpec(bucket).getActualSpec();
// ShardSpec to be published.
final ShardSpec shardSpecForPublishing;
if (config.isForceExtendableShardSpecs()) {
shardSpecForPublishing = new NumberedShardSpec(shardSpecForPartitioning.getPartitionNum(), config.getShardSpecCount(bucket));
} else {
shardSpecForPublishing = shardSpecForPartitioning;
}
final DataSegment segmentTemplate = new DataSegment(
config.getDataSource(),
interval,
config.getSchema().getTuningConfig().getVersion(),
null,
ImmutableList.copyOf(allDimensionNames),
metricNames,
shardSpecForPublishing,
-1,
-1
);
final DataSegment segment = JobHelper.serializeOutIndex(
segmentTemplate,
context.getConfiguration(),
context,
mergedBase,
JobHelper.makeFileNamePath(
new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
outputFS,
segmentTemplate,
JobHelper.INDEX_ZIP,
config.DATA_SEGMENT_PUSHER
),
JobHelper.makeFileNamePath(
new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
outputFS,
segmentTemplate,
JobHelper.DESCRIPTOR_JSON,
config.DATA_SEGMENT_PUSHER
),
JobHelper.makeTmpPath(
new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
outputFS,
segmentTemplate,
context.getTaskAttemptID(),
config.DATA_SEGMENT_PUSHER
),
config.DATA_SEGMENT_PUSHER
);
Path descriptorPath = config.makeDescriptorInfoPath(segment);
descriptorPath = JobHelper.prependFSIfNullScheme(
FileSystem.get(
descriptorPath.toUri(),
context.getConfiguration()
), descriptorPath
);
log.info("Writing descriptor to path[%s]", descriptorPath);
JobHelper.writeSegmentDescriptor(
config.makeDescriptorInfoDir().getFileSystem(context.getConfiguration()),
segment,
descriptorPath,
context
);
for (File file : toMerge) {
FileUtils.deleteDirectory(file);
}
}
catch (ExecutionException | TimeoutException e) {
throw Throwables.propagate(e);
}
finally {
index.close();
if (persistExecutor != null) {
persistExecutor.shutdownNow();
}
}
}
}
聚合操作
- 聚合操作使用 druid
OnheapIncrementalIndex
- persistExecutor线程池进行persist操作(仍然与index_realtime类似)
- DataSegment的生成
final DataSegment segment = JobHelper.serializeOutIndex(
public static DataSegment serializeOutIndex(
final DataSegment segmentTemplate,
final Configuration configuration,
final Progressable progressable,
final File mergedBase,
final Path finalIndexZipFilePath,
final Path finalDescriptorPath,
final Path tmpPath,
DataSegmentPusher dataSegmentPusher
)
final DataSegment segment = JobHelper.serializeOutIndex(
segmentTemplate,
context.getConfiguration(),
context,
mergedBase,
JobHelper.makeFileNamePath(
new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
outputFS,
segmentTemplate,
JobHelper.INDEX_ZIP,
config.DATA_SEGMENT_PUSHER
),
JobHelper.makeFileNamePath(
new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
outputFS,
segmentTemplate,
JobHelper.DESCRIPTOR_JSON,
config.DATA_SEGMENT_PUSHER
),
JobHelper.makeTmpPath(
new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
outputFS,
segmentTemplate,
context.getTaskAttemptID(),
config.DATA_SEGMENT_PUSHER
),
config.DATA_SEGMENT_PUSHER
);
分片数的确定
- 创建segment 使用到了
ShardSpec
,如果没有强制设置,则是前文估算出来的shard数量
// ShardSpec used for partitioning within this Hadoop job.
final ShardSpec shardSpecForPartitioning = config.getShardSpec(bucket).getActualSpec();
// ShardSpec to be published.
final ShardSpec shardSpecForPublishing;
if (config.isForceExtendableShardSpecs()) {
shardSpecForPublishing = new NumberedShardSpec(shardSpecForPartitioning.getPartitionNum(), config.getShardSpecCount(bucket));
} else {
shardSpecForPublishing = shardSpecForPartitioning;
}
segment 写入
- segment文件组成
* descriptor.json
* index.zip
* version.bin
* meta.smooth
* 0000.smooth
- segment创建后,会输出segment文件,并删除一些临时文件
Path descriptorPath = config.makeDescriptorInfoPath(segment);
descriptorPath = JobHelper.prependFSIfNullScheme(
FileSystem.get(
descriptorPath.toUri(),
context.getConfiguration()
), descriptorPath
);
log.info("Writing descriptor to path[%s]", descriptorPath);
JobHelper.writeSegmentDescriptor(
config.makeDescriptorInfoDir().getFileSystem(context.getConfiguration()),
segment,
descriptorPath,
context
);
for (File file : toMerge) {
FileUtils.deleteDirectory(file);
}