Druid.io index_hadoop

前言

主要是对druid.io index_hadoop的两个MR任务进行源码分析,可以结合下任意一个 index_hadoop 任务的log 来配合理解

index_hadoop 任务配置和说明

index_hadoop任务配置官方说明文档

源码实现上很多实现类相关都是根据任务配置来的,当然配置一般是有默认的

查看druid的index_hadoop任务log,可以发现是有两个job: 第一个是io.druid.indexer.DetermineHashedPartitionsJob,第二个是io.druid.indexer.IndexGeneratorJob;显然要弄清楚index_hadoop任务是怎么运行的,需要去分析下这两个job ?

DetermineHashedPartitionsJob

/**
 * Determines appropriate ShardSpecs for a job by determining approximate cardinality of data set using HyperLogLog
 */
public class DetermineHashedPartitionsJob implements Jobby

使用HyperLogLog来估算数据集的基数,以此决定一个合适的分片数(Druid.io segment的数量)。

run方法

@Override
public boolean run()
{
  try {
    /*
     * Group by (timestamp, dimensions) so we can correctly count dimension values as they would appear
     * in the final segment.
     */
    final long startTime = System.currentTimeMillis();
    groupByJob = Job.getInstance(
        new Configuration(),
        StringUtils.format("%s-determine_partitions_hashed-%s", config.getDataSource(), config.getIntervals())
    );
Map 操作

MapperClass是public static class DetermineCardinalityMapper extends HadoopDruidIndexerMapper<LongWritable, BytesWritable> 也是继承了HadoopDruidIndexerMapperinnerMap方法如下

首先获取原始数据在HadoopDruidIndexerMapper的map方法中,真正的处理在DetermineCardinalityMapper的innerMap方法中

  • HadoopDruidIndexerMapper 对原始数据解析成List<InputRow>,并做一些数据判断、过滤等操作
 @Override
  protected void map(Object key, Object value, Context context) throws IOException, InterruptedException
  {
    try {
      final List<InputRow> inputRows = parseInputRow(value, parser);

      for (InputRow inputRow : inputRows) {
        try {
          if (inputRow == null) {
            // Throw away null rows from the parser.
            log.debug("Throwing away row [%s]", value);
            context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
            continue;
          }

          if (!Intervals.ETERNITY.contains(inputRow.getTimestamp())) {
            final String errorMsg = StringUtils.format(
                "Encountered row with timestamp that cannot be represented as a long: [%s]",
                inputRow
            );
            throw new ParseException(errorMsg);
          }

          if (!granularitySpec.bucketIntervals().isPresent()
              || granularitySpec.bucketInterval(DateTimes.utc(inputRow.getTimestampFromEpoch()))
                                .isPresent()) {
            innerMap(inputRow, context, reportParseExceptions);
          } else {
            context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
          }
        }
        catch (ParseException pe) {
          handleParseException(pe, context);
        }
      }
    }
    catch (ParseException pe) {
      handleParseException(pe, context);
    }
    catch (RuntimeException e) {
      throw new RE(e, "Failure on row[%s]", value);
    }
  }
InputRow

在这里插入图片描述

@Override
protected void innerMap(
    InputRow inputRow,
    Context context,
    boolean reportParseExceptions
) throws IOException, InterruptedException
{

  final List<Object> groupKey = Rows.toGroupKey(
      rollupGranularity.bucketStart(inputRow.getTimestamp()).getMillis(),
      inputRow
  );
  Interval interval;
  if (determineIntervals) {
    interval = config.getGranularitySpec()
                     .getSegmentGranularity()
                     .bucket(DateTimes.utc(inputRow.getTimestampFromEpoch()));

    if (!hyperLogLogs.containsKey(interval)) {
      hyperLogLogs.put(interval, HyperLogLogCollector.makeLatestCollector());
    }
  } else {
    final Optional<Interval> maybeInterval = config.getGranularitySpec()
                                                   .bucketInterval(DateTimes.utc(inputRow.getTimestampFromEpoch()));

    if (!maybeInterval.isPresent()) {
      throw new ISE("WTF?! No bucket found for timestamp: %s", inputRow.getTimestampFromEpoch());
    }
    interval = maybeInterval.get();
  }

  hyperLogLogs
      .get(interval)
      .add(hashFunction.hashBytes(HadoopDruidIndexerConfig.JSON_MAPPER.writeValueAsBytes(groupKey)).asBytes());

  context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_PROCESSED_COUNTER).increment(1);
}
  • 使用private Map<Interval, HyperLogLogCollector> hyperLogLogs;统计,按照SegmentGranularity的Interval作为该Map的key, groupKey添加到HyperLogLogCollector
  • groupKey依然是将时间戳按照查询粒度QueryGranularity进行规整,与维度列(hashedDimensions)一起构成groupKey
  • Mapper的输出就是遍历hyperLogLogs写入key,value
Rows.toGroupKey
/**
* @param timeStamp rollup up timestamp to be used to create group key
 * @param inputRow  input row
 *
 * @return groupKey for the given input row
 */
public static List<Object> toGroupKey(long timeStamp, InputRow inputRow)
{
  final Map<String, Set<String>> dims = Maps.newTreeMap();
  for (final String dim : inputRow.getDimensions()) {
    // 得到维度值的set集合,然后 dimValues.size() 就是该维度的基数了
    final Set<String> dimValues = ImmutableSortedSet.copyOf(inputRow.getDimension(dim));
    if (dimValues.size() > 0) {
      dims.put(dim, dimValues);
    }
  }
  return ImmutableList.of(
      timeStamp,
      dims
  );
}
reduce 操作

reducerClass: public static class DetermineCardinalityReducer extends Reducer<LongWritable, BytesWritable, NullWritable, NullWritable> {

@Override
protected void reduce(
    LongWritable key,
    Iterable<BytesWritable> values,
    Context context
) throws IOException, InterruptedException
{
  HyperLogLogCollector aggregate = HyperLogLogCollector.makeLatestCollector();
  for (BytesWritable value : values) {
    aggregate.fold(
        HyperLogLogCollector.makeCollector(ByteBuffer.wrap(value.getBytes(), 0, value.getLength()))
    );
  }

  Interval interval;

  if (determineIntervals) {
    interval = config.getGranularitySpec().getSegmentGranularity().bucket(DateTimes.utc(key.get()));
  } else {
    Optional<Interval> intervalOptional = config.getGranularitySpec().bucketInterval(DateTimes.utc(key.get()));

    if (!intervalOptional.isPresent()) {
      throw new ISE("WTF?! No bucket found for timestamp: %s", key.get());
    }
    interval = intervalOptional.get();
  }

  intervals.add(interval);
  final Path outPath = config.makeSegmentPartitionInfoPath(interval);
  final OutputStream out = Utils.makePathAndOutputStream(
      context, outPath, config.isOverwriteFiles()
  );

  try {
    HadoopDruidIndexerConfig.JSON_MAPPER.writerWithType(
        new TypeReference<Long>()
        {
        }
    ).writeValue(
        out,
        aggregate.estimateCardinalityRound()
    );
  }
  finally {
    Closeables.close(out, false);
  }
}

对mapper的输出进行汇总,即能知道每个segment的总数据记录数量

DetermineHashedPartitionsJob完成

DetermineHashedPartitionsJob 会log一些信息

  • 每个DetermineHashedPartitionsJob会有预估的rows,以及配合配置的targetPartitionSize 后可能的shard数量
  • 最后conf会重新设置shardSpecs参数
实际例子log

eg:(聚合后实际有55,342,482行,聚合前57238510,维度很多,聚合特别差)

2020-04-26T11:45:11,430 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Job completed, loading up partitions for intervals[Optional.of([2020-04-25T00:00:00.000+08:00/2020-04-26T00:00:00.000+08:00])].
2020-04-26T11:45:11,468 INFO [task-runner-0-priority-0] org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.deflate]
2020-04-26T11:45:11,596 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Found approximately [54,135,187] rows in data.
2020-04-26T11:45:11,596 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Creating [3] shards
2020-04-26T11:45:11,597 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-25T00:00:00.000+08:00], partition[0], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=0, partitions=3, partitionDimensions=[]}, shardNum=0}]
2020-04-26T11:45:11,598 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-25T00:00:00.000+08:00], partition[1], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=1, partitions=3, partitionDimensions=[]}, shardNum=1}]
2020-04-26T11:45:11,598 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-25T00:00:00.000+08:00], partition[2], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=2, partitions=3, partitionDimensions=[]}, shardNum=2}]
2020-04-26T11:45:11,598 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DetermineHashedPartitionsJob took 325209 millis
  • 当时的配置(这解释为什么3个分片)
"partitionsSpec" : {
   "type" : "hashed",
   "targetPartitionSize" : 20000000,
   "maxPartitionSize" : 30000000,
   "assumeGrouped" : false,
   "numShards" : -1,
   "partitionDimensions" : [ ]
},
实际例子log2

eg:(聚合后实际有367,313,712行,聚合前359,930,085,维度也是很多,聚合特别差,且有一些去重指标),且由于数据量大,19个segment平均也是1.7G,1天是33G的segment总大小

2020-04-27T19:57:23,781 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Job completed, loading up partitions for intervals[Optional.of([2020-04-26T00:00:00.000+08:00/2020-04-27T00:00:00.000+08:00])].
2020-04-27T19:57:23,823 INFO [task-runner-0-priority-0] org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.deflate]
2020-04-27T19:57:24,149 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Found approximately [367,313,712] rows in data.
2020-04-27T19:57:24,150 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Creating [19] shards
2020-04-27T19:57:24,151 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - DateTime[2020-04-26T00:00:00.000+08:00], partition[0], spec[HadoopyShardSpec{actualSpec=HashBasedNumberedShardSpec{partitionNum=0, partitions=19, partitionDimensions=[]}, shardNum=0}]
源码
 log.info("Job completed, loading up partitions for intervals[%s].", config.getSegmentGranularIntervals());
      FileSystem fileSystem = null;
      if (!config.getSegmentGranularIntervals().isPresent()) {
        final Path intervalInfoPath = config.makeIntervalInfoPath();
        fileSystem = intervalInfoPath.getFileSystem(groupByJob.getConfiguration());
        if (!Utils.exists(groupByJob, fileSystem, intervalInfoPath)) {
          throw new ISE("Path[%s] didn't exist!?", intervalInfoPath);
        }
        List<Interval> intervals = config.JSON_MAPPER.readValue(
            Utils.openInputStream(groupByJob, intervalInfoPath),
            new TypeReference<List<Interval>>() {}
        );
        config.setGranularitySpec(
            new UniformGranularitySpec(
                config.getGranularitySpec().getSegmentGranularity(),
                config.getGranularitySpec().getQueryGranularity(),
                config.getGranularitySpec().isRollup(),
                intervals
            )
        );
        log.info("Determined Intervals for Job [%s].", config.getSegmentGranularIntervals());
      }
      Map<Long, List<HadoopyShardSpec>> shardSpecs = Maps.newTreeMap(DateTimeComparator.getInstance());
      int shardCount = 0;
      for (Interval segmentGranularity : config.getSegmentGranularIntervals().get()) {
        DateTime bucket = segmentGranularity.getStart();

        final Path partitionInfoPath = config.makeSegmentPartitionInfoPath(segmentGranularity);
        if (fileSystem == null) {
          fileSystem = partitionInfoPath.getFileSystem(groupByJob.getConfiguration());
        }
        if (Utils.exists(groupByJob, fileSystem, partitionInfoPath)) {
          final Long numRows = config.JSON_MAPPER.readValue(
              Utils.openInputStream(groupByJob, partitionInfoPath),
              new TypeReference<Long>() {}
          );

          log.info("Found approximately [%,d] rows in data.", numRows);

          final int numberOfShards = (int) Math.ceil((double) numRows / config.getTargetPartitionSize());

          log.info("Creating [%,d] shards", numberOfShards);

          List<HadoopyShardSpec> actualSpecs = Lists.newArrayListWithExpectedSize(numberOfShards);
          if (numberOfShards == 1) {
            actualSpecs.add(new HadoopyShardSpec(NoneShardSpec.instance(), shardCount++));
          } else {
            for (int i = 0; i < numberOfShards; ++i) {
              actualSpecs.add(
                  new HadoopyShardSpec(
                      new HashBasedNumberedShardSpec(
                          i,
                          numberOfShards,
                          null,
                          HadoopDruidIndexerConfig.JSON_MAPPER
                      ),
                      shardCount++
                  )
              );
              log.info("DateTime[%s], partition[%d], spec[%s]", bucket, i, actualSpecs.get(i));
            }
          }

          shardSpecs.put(bucket.getMillis(), actualSpecs);

        } else {
          log.info("Path[%s] didn't exist!?", partitionInfoPath);
        }
      }

      config.setShardSpecs(shardSpecs);
      log.info(
          "DetermineHashedPartitionsJob took %d millis",
          (System.currentTimeMillis() - startTime)
      );

IndexGeneratorJob 提交

public class IndexGeneratorJob implements Jobby

@Override
  public boolean run()
  {
    try {
      job = Job.getInstance(
          new Configuration(),
          StringUtils.format("%s-index-generator-%s", config.getDataSource(), config.getIntervals())
      );

      job.getConfiguration().set("io.sort.record.percent", "0.23");

      JobHelper.injectSystemProperties(job);
      config.addJobProperties(job);
      // inject druid properties like deep storage bindings
      JobHelper.injectDruidProperties(job.getConfiguration(), config.getAllowedHadoopPrefix());

      job.setMapperClass(IndexGeneratorMapper.class);
      job.setMapOutputValueClass(BytesWritable.class);

      SortableBytes.useSortableBytesAsMapOutputKey(job);

      int numReducers = Iterables.size(config.getAllBuckets().get());
      if (numReducers == 0) {
        throw new RuntimeException("No buckets?? seems there is no data to index.");
      }

      if (config.getSchema().getTuningConfig().getUseCombiner()) {
        job.setCombinerClass(IndexGeneratorCombiner.class);
        job.setCombinerKeyGroupingComparatorClass(BytesWritable.Comparator.class);
      }

      job.setNumReduceTasks(numReducers);
      job.setPartitionerClass(IndexGeneratorPartitioner.class);

      setReducerClass(job);
      job.setOutputKeyClass(BytesWritable.class);
      job.setOutputValueClass(Text.class);
      job.setOutputFormatClass(IndexGeneratorOutputFormat.class);
      FileOutputFormat.setOutputPath(job, config.makeIntermediatePath());

      config.addInputPaths(job);

      config.intoConfiguration(job);

      JobHelper.setupClasspath(
          JobHelper.distributedClassPath(config.getWorkingPath()),
          JobHelper.distributedClassPath(config.makeIntermediatePath()),
          job
      );

      job.submit();
      log.info("Job %s submitted, status available at %s", job.getJobName(), job.getTrackingURL());

      boolean success = job.waitForCompletion(true);

      Counters counters = job.getCounters();
      if (counters == null) {
        log.info("No counters found for job [%s]", job.getJobName());
      } else {
        Counter invalidRowCount = counters.findCounter(HadoopDruidIndexerConfig.IndexJobCounters.INVALID_ROW_COUNTER);
        if (invalidRowCount != null) {
          jobStats.setInvalidRowCount(invalidRowCount.getValue());
        } else {
          log.info("No invalid row counter found for job [%s]", job.getJobName());
        }
      }

      return success;
    }
    catch (Exception e) {
      throw new RuntimeException(e);
    }
  }
  • injectSystemProperties
  • injectDruidProperties
  • numReducers(经典的No buckets?? seems there is no data to index.
  • 是否开启combiner
 if (config.getSchema().getTuningConfig().getUseCombiner()) {
  job.setCombinerClass(IndexGeneratorCombiner.class);
  job.setCombinerKeyGroupingComparatorClass(BytesWritable.Comparator.class);
}
  • 任务log查看地址:log.info("Job %s submitted, status available at %s", job.getJobName(), job.getTrackingURL());
  • Job输入就是任务配置的inputSpec属性
  • Job输出FileOutputFormat.setOutputPath(job, config.makeIntermediatePath());
public Path makeIntermediatePath()
{
  return new Path(
      StringUtils.format(
          "%s/%s/%s_%s",
          getWorkingPath(),
          schema.getDataSchema().getDataSource(),
          schema.getTuningConfig().getVersion().replace(":", ""),
          schema.getUniqueId()
      )
  );
}

private static final String DEFAULT_WORKING_PATH = "/tmp/druid-indexing";

public String getWorkingPath()
{
   final String workingPath = schema.getTuningConfig().getWorkingPath();
   return workingPath == null ? DEFAULT_WORKING_PATH : workingPath;
 }
  • MapperClass:job.setMapperClass(IndexGeneratorMapper.class);
  • ReducerClass:setReducerClass(job);即:job.setReducerClass(IndexGeneratorReducer.class);

map

public abstract class HadoopDruidIndexerMapper<KEYOUT, VALUEOUT> extends Mapper<Object, Object, KEYOUT, VALUEOUT>

@Override
  protected void map(Object key, Object value, Context context) throws IOException, InterruptedException
  {
    try {
      // 获取原始数据 并转换为Druid的 InputRow 对象
      final List<InputRow> inputRows = parseInputRow(value, parser);

      for (InputRow inputRow : inputRows) {
        try {
          // 抛弃 null 的 InputRow
          if (inputRow == null) {
            // Throw away null rows from the parser.
            log.debug("Throwing away row [%s]", value);
            context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
            continue;
          }

          // 时间戳不合法的,抛出异常
          if (!Intervals.ETERNITY.contains(inputRow.getTimestamp())) {
            final String errorMsg = StringUtils.format(
                "Encountered row with timestamp that cannot be represented as a long: [%s]",
                inputRow
            );
            throw new ParseException(errorMsg);
          }

          // 任务有效时间范围内的进行innerMap操作,否则抛弃掉并打点记录
          if (!granularitySpec.bucketIntervals().isPresent()
              || granularitySpec.bucketInterval(DateTimes.utc(inputRow.getTimestampFromEpoch()))
                                .isPresent()) {
            innerMap(inputRow, context, reportParseExceptions);
          } else {
            context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER).increment(1);
          }
        }
        catch (ParseException pe) {
          handleParseException(pe, context);
        }
      }
    }
    catch (ParseException pe) {
      handleParseException(pe, context);
    }
    catch (RuntimeException e) {
      throw new RE(e, "Failure on row[%s]", value);
    }
  }
public static class IndexGeneratorMapper extends HadoopDruidIndexerMapper<BytesWritable, BytesWritable>
{
 @Override
    protected void innerMap(
        InputRow inputRow,
        Context context,
        boolean reportParseExceptions
    ) throws IOException, InterruptedException
    {
      // 根据 任务配置的:granularitySpec 来分桶
      // Group by bucket, sort by timestamp
      final Optional<Bucket> bucket = getConfig().getBucket(inputRow);

      // 无数据情况
      if (!bucket.isPresent()) {
        throw new ISE("WTF?! No bucket found for row: %s", inputRow);
      }

      // rollup up timestamp,分组
      final long truncatedTimestamp = granularitySpec.getQueryGranularity().bucketStart(inputRow.getTimestamp()).getMillis();
      final byte[] hashedDimensions = hashFunction.hashBytes(
          HadoopDruidIndexerConfig.JSON_MAPPER.writeValueAsBytes(
              Rows.toGroupKey(
                  truncatedTimestamp,
                  inputRow
              )
          )
      ).asBytes();

      // 重要的对象:InputRowSerde, 序列化,toBytes
      // type SegmentInputRow serves as a marker that these InputRow instances have already been combined
      // and they contain the columns as they show up in the segment after ingestion, not what you would see in raw
      // data
      InputRowSerde.SerializeResult serializeResult = inputRow instanceof SegmentInputRow ?
                                                 InputRowSerde.toBytes(
                                                     typeHelperMap,
                                                     inputRow,
                                                     aggsForSerializingSegmentInputRow
                                                 )
                                                                                     :
                                                 InputRowSerde.toBytes(
                                                     typeHelperMap,
                                                     inputRow,
                                                     aggregators
                                                 );
      // mapper的输出
      context.write(
          // K
          new SortableBytes(
              bucket.get().toGroupKey(),
              // sort rows by truncated timestamp and hashed dimensions to help reduce spilling on the reducer side
              ByteBuffer.allocate(Longs.BYTES + hashedDimensions.length)
                        .putLong(truncatedTimestamp)
                        .put(hashedDimensions)
                        .array()
          ).toBytesWritable(),
          // V
          new BytesWritable(serializeResult.getSerializedRow())
      );

      ParseException pe = IncrementalIndex.getCombinedParseException(
          inputRow,
          serializeResult.getParseExceptionMessages(),
          null
      );
      if (pe != null) {
        throw pe;
      } else {
        context.getCounter(HadoopDruidIndexerConfig.IndexJobCounters.ROWS_PROCESSED_COUNTER).increment(1);
      }
    }
  
}
Map 具体说明
  • MapperClassIndexGeneratorMapper继承了HadoopDruidIndexerMapper, HadoopDruidIndexerMapper会对原始数据进行解析和过滤成符合要求的InputRow然后让IndexGeneratorMapper重写的innerMap进一步操作;抛弃的数据会被HadoopDruidIndexerConfig.IndexJobCounters.ROWS_THROWN_AWAY_COUNTER计数
  • innerMap方法得到每一个InputRow,第一步就是按照InputRow的时间进行分桶操作(是将时间戳按照查询粒度QueryGranularity进行规整,与维度列(hashedDimensions)一起构成groupKey,groupKey 作为一个count计数HadoopDruidIndexerConfig.IndexJobCounters.ROWS_PROCESSED_COUNTER)

常见parser如下,则使用io.druid.java.util.common.parsers.JSONPathParser 对原始的json数据进行提取,最后也是转化为InputRow对象

"parser" : {
      "type" : "hadoopyString",
       "parseSpec" : {
         "format" : "json",
         "timestampSpec" : {
           "column" : "timestamp",
           "format" : "auto"
         },
         "dimensionsSpec" : {
           "dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
           "dimensionExclusions" : [],
           "spatialDimensions" : []
         }
       }
     },

Combiner

public static class IndexGeneratorCombiner extends Reducer<BytesWritable, BytesWritable, BytesWritable, BytesWritable>
{
 @Override
    protected void reduce(
        final BytesWritable key, Iterable<BytesWritable> values, final Context context
    ) throws IOException, InterruptedException
    {
      // mapper 输出的 BytesWritable 数据
      Iterator<BytesWritable> iter = values.iterator();
      BytesWritable first = iter.next();

      if (iter.hasNext()) {
        LinkedHashSet<String> dimOrder = Sets.newLinkedHashSet();
        SortableBytes keyBytes = SortableBytes.fromBytesWritable(key);
        Bucket bucket = Bucket.fromGroupKey(keyBytes.getGroupKey()).lhs;
        // 索引文件:也是先内存中,然后超过配置行数或时间会持久化成文件
        IncrementalIndex index = makeIncrementalIndex(bucket, combiningAggs, config, null, null);
        index.add(InputRowSerde.fromBytes(typeHelperMap, first.getBytes(), aggregators));

        while (iter.hasNext()) {
          context.progress();
          InputRow value = InputRowSerde.fromBytes(typeHelperMap, iter.next().getBytes(), aggregators);

          if (!index.canAppendRow()) {
            dimOrder.addAll(index.getDimensionOrder());
            log.info("current index full due to [%s]. creating new index.", index.getOutOfRowsReason());
            flushIndexToContextAndClose(key, index, context);
            index = makeIncrementalIndex(bucket, combiningAggs, config, dimOrder, index.getColumnCapabilities());
          }

          index.add(value);
        }
        // 最后flush和输出
        flushIndexToContextAndClose(key, index, context);
      } else {
        context.write(key, first);
      }
    }

    private void flushIndexToContextAndClose(BytesWritable key, IncrementalIndex index, Context context)
        throws IOException, InterruptedException
    {
      final List<String> dimensions = index.getDimensionNames();
      Iterator<Row> rows = index.iterator();
      while (rows.hasNext()) {
        context.progress();
        Row row = rows.next();
        InputRow inputRow = getInputRowFromRow(row, dimensions);

        // reportParseExceptions is true as any unparseable data is already handled by the mapper.
        InputRowSerde.SerializeResult serializeResult = InputRowSerde.toBytes(typeHelperMap, inputRow, combiningAggs);

        context.write(
            key,
            new BytesWritable(serializeResult.getSerializedRow())
        );
      }
      index.close();
    }
// 聚合指标根据配置是schema获取到,对应有聚合函数
aggregators = config.getSchema().getDataSchema().getAggregators();
combiningAggs = new AggregatorFactory[aggregators.length];
      for (int i = 0; i < aggregators.length; ++i) {
        combiningAggs[i] = aggregators[i].getCombiningFactory();
      }
// 不同维度数据类型处理
 typeHelperMap = InputRowSerde.getTypeHelperMap(config.getSchema()
                                                           .getDataSchema()
                                                           .getParser()
                                                           .getParseSpec()
                                                           .getDimensionsSpec());
// InputRowSerde.toBytes 会处理InputRow: 时间维度,普通维度,聚合指标依次会处理
public static final SerializeResult toBytes(
      final Map<String, IndexSerdeTypeHelper> typeHelperMap,
      final InputRow row,
      AggregatorFactory[] aggs
  )
  {
    try {
      List<String> parseExceptionMessages = new ArrayList<>();
      ByteArrayDataOutput out = ByteStreams.newDataOutput();

      //write timestamp
      out.writeLong(row.getTimestampFromEpoch());

      //writing all dimensions
      List<String> dimList = row.getDimensions();

      WritableUtils.writeVInt(out, dimList.size());
      if (dimList != null) {
        for (String dim : dimList) {
          IndexSerdeTypeHelper typeHelper = typeHelperMap.get(dim);
          if (typeHelper == null) {
            typeHelper = STRING_HELPER;
          }
          writeString(dim, out);

          try {
            typeHelper.serialize(out, row.getRaw(dim));
          }
          catch (ParseException pe) {
            parseExceptionMessages.add(pe.getMessage());
          }
        }
      }

      //writing all metrics
      Supplier<InputRow> supplier = new Supplier<InputRow>()
      {
        @Override
        public InputRow get()
        {
          return row;
        }
      };
      WritableUtils.writeVInt(out, aggs.length);
      for (AggregatorFactory aggFactory : aggs) {
        String k = aggFactory.getName();
        writeString(k, out);

        try (Aggregator agg = aggFactory.factorize(
            IncrementalIndex.makeColumnSelectorFactory(
                VirtualColumns.EMPTY,
                aggFactory,
                supplier,
                true
            )
        )) {
          try {
            agg.aggregate();
          }
          catch (ParseException e) {
            // "aggregate" can throw ParseExceptions if a selector expects something but gets something else.
            log.debug(e, "Encountered parse error, skipping aggregator[%s].", k);
            parseExceptionMessages.add(e.getMessage());
          }

          String t = aggFactory.getTypeName();

          if (t.equals("float")) {
            out.writeFloat(agg.getFloat());
          } else if (t.equals("long")) {
            WritableUtils.writeVLong(out, agg.getLong());
          } else if (t.equals("double")) {
            out.writeDouble(agg.getDouble());
          } else {
            //its a complex metric
            Object val = agg.get();
            ComplexMetricSerde serde = getComplexMetricSerde(t);
            writeBytes(serde.toBytes(val), out);
          }
        }
      }

      return new SerializeResult(out.toByteArray(), parseExceptionMessages);
    }
    catch (IOException ex) {
      throw new RuntimeException(ex);
    }
  }

reduce操作

 public static class IndexGeneratorReducer extends Reducer<BytesWritable, BytesWritable, BytesWritable, Text>
  {
 @Override
    protected void reduce(
        BytesWritable key, Iterable<BytesWritable> values, final Context context
    ) throws IOException, InterruptedException
    {
      SortableBytes keyBytes = SortableBytes.fromBytesWritable(key);
      Bucket bucket = Bucket.fromGroupKey(keyBytes.getGroupKey()).lhs;

      final Interval interval = config.getGranularitySpec().bucketInterval(bucket.time).get();

      ListeningExecutorService persistExecutor = null;
      List<ListenableFuture<?>> persistFutures = Lists.newArrayList();
      IncrementalIndex index = makeIncrementalIndex(
          bucket,
          combiningAggs,
          config,
          null,
          null
      );
      try {
        File baseFlushFile = File.createTempFile("base", "flush");
        baseFlushFile.delete();
        baseFlushFile.mkdirs();

        Set<File> toMerge = Sets.newTreeSet();
        int indexCount = 0;
        int lineCount = 0;
        int runningTotalLineCount = 0;
        long startTime = System.currentTimeMillis();

        Set<String> allDimensionNames = Sets.newLinkedHashSet();
        final ProgressIndicator progressIndicator = makeProgressIndicator(context);
        int numBackgroundPersistThreads = config.getSchema().getTuningConfig().getNumBackgroundPersistThreads();
        if (numBackgroundPersistThreads > 0) {
          final BlockingQueue<Runnable> queue = new SynchronousQueue<>();
          ExecutorService executorService = new ThreadPoolExecutor(
              numBackgroundPersistThreads,
              numBackgroundPersistThreads,
              0L,
              TimeUnit.MILLISECONDS,
              queue,
              Execs.makeThreadFactory("IndexGeneratorJob_persist_%d"),
              new RejectedExecutionHandler()
              {
                @Override
                public void rejectedExecution(Runnable r, ThreadPoolExecutor executor)
                {
                  try {
                    executor.getQueue().put(r);
                  }
                  catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    throw new RejectedExecutionException("Got Interrupted while adding to the Queue", e);
                  }
                }
              }
          );
          persistExecutor = MoreExecutors.listeningDecorator(executorService);
        } else {
          persistExecutor = MoreExecutors.sameThreadExecutor();
        }

        for (final BytesWritable bw : values) {
          context.progress();

          final InputRow inputRow = index.formatRow(InputRowSerde.fromBytes(typeHelperMap, bw.getBytes(), aggregators));
          int numRows = index.add(inputRow).getRowCount();

          ++lineCount;

          if (!index.canAppendRow()) {
            allDimensionNames.addAll(index.getDimensionOrder());

            log.info(index.getOutOfRowsReason());
            log.info(
                "%,d lines to %,d rows in %,d millis",
                lineCount - runningTotalLineCount,
                numRows,
                System.currentTimeMillis() - startTime
            );
            runningTotalLineCount = lineCount;

            final File file = new File(baseFlushFile, StringUtils.format("index%,05d", indexCount));
            toMerge.add(file);

            context.progress();
            final IncrementalIndex persistIndex = index;
            persistFutures.add(
                persistExecutor.submit(
                    new ThreadRenamingRunnable(StringUtils.format("%s-persist", file.getName()))
                    {
                      @Override
                      public void doRun()
                      {
                        try {
                          persist(persistIndex, interval, file, progressIndicator);
                        }
                        catch (Exception e) {
                          log.error(e, "persist index error");
                          throw Throwables.propagate(e);
                        }
                        finally {
                          // close this index
                          persistIndex.close();
                        }
                      }
                    }
                )
            );

            index = makeIncrementalIndex(
                bucket,
                combiningAggs,
                config,
                allDimensionNames,
                persistIndex.getColumnCapabilities()
            );
            startTime = System.currentTimeMillis();
            ++indexCount;
          }
        }

        allDimensionNames.addAll(index.getDimensionOrder());

        log.info("%,d lines completed.", lineCount);

        List<QueryableIndex> indexes = Lists.newArrayListWithCapacity(indexCount);
        final File mergedBase;

        if (toMerge.size() == 0) {
          if (index.isEmpty()) {
            throw new IAE("If you try to persist empty indexes you are going to have a bad time");
          }

          mergedBase = new File(baseFlushFile, "merged");
          persist(index, interval, mergedBase, progressIndicator);
        } else {
          if (!index.isEmpty()) {
            final File finalFile = new File(baseFlushFile, "final");
            persist(index, interval, finalFile, progressIndicator);
            toMerge.add(finalFile);
          }

          Futures.allAsList(persistFutures).get(1, TimeUnit.HOURS);
          persistExecutor.shutdown();

          for (File file : toMerge) {
            indexes.add(HadoopDruidIndexerConfig.INDEX_IO.loadIndex(file));
          }

          log.info("starting merge of intermediate persisted segments.");
          long mergeStartTime = System.currentTimeMillis();
          mergedBase = mergeQueryableIndex(
              indexes, aggregators, new File(baseFlushFile, "merged"), progressIndicator
          );
          log.info(
              "finished merge of intermediate persisted segments. time taken [%d] ms.",
              (System.currentTimeMillis() - mergeStartTime)
          );
        }
        final FileSystem outputFS = new Path(config.getSchema().getIOConfig().getSegmentOutputPath())
            .getFileSystem(context.getConfiguration());

        // ShardSpec used for partitioning within this Hadoop job.
        final ShardSpec shardSpecForPartitioning = config.getShardSpec(bucket).getActualSpec();

        // ShardSpec to be published.
        final ShardSpec shardSpecForPublishing;
        if (config.isForceExtendableShardSpecs()) {
          shardSpecForPublishing = new NumberedShardSpec(shardSpecForPartitioning.getPartitionNum(), config.getShardSpecCount(bucket));
        } else {
          shardSpecForPublishing = shardSpecForPartitioning;
        }

        final DataSegment segmentTemplate = new DataSegment(
            config.getDataSource(),
            interval,
            config.getSchema().getTuningConfig().getVersion(),
            null,
            ImmutableList.copyOf(allDimensionNames),
            metricNames,
            shardSpecForPublishing,
            -1,
            -1
        );
        final DataSegment segment = JobHelper.serializeOutIndex(
            segmentTemplate,
            context.getConfiguration(),
            context,
            mergedBase,
            JobHelper.makeFileNamePath(
                new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
                outputFS,
                segmentTemplate,
                JobHelper.INDEX_ZIP,
                config.DATA_SEGMENT_PUSHER
            ),
            JobHelper.makeFileNamePath(
                new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
                outputFS,
                segmentTemplate,
                JobHelper.DESCRIPTOR_JSON,
                config.DATA_SEGMENT_PUSHER
            ),
            JobHelper.makeTmpPath(
                new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
                outputFS,
                segmentTemplate,
                context.getTaskAttemptID(),
                config.DATA_SEGMENT_PUSHER
            ),
            config.DATA_SEGMENT_PUSHER
        );

        Path descriptorPath = config.makeDescriptorInfoPath(segment);
        descriptorPath = JobHelper.prependFSIfNullScheme(
            FileSystem.get(
                descriptorPath.toUri(),
                context.getConfiguration()
            ), descriptorPath
        );

        log.info("Writing descriptor to path[%s]", descriptorPath);
        JobHelper.writeSegmentDescriptor(
            config.makeDescriptorInfoDir().getFileSystem(context.getConfiguration()),
            segment,
            descriptorPath,
            context
        );
        for (File file : toMerge) {
          FileUtils.deleteDirectory(file);
        }
      }
      catch (ExecutionException | TimeoutException e) {
        throw Throwables.propagate(e);
      }
      finally {
        index.close();
        if (persistExecutor != null) {
          persistExecutor.shutdownNow();
        }
      }
    }
  }
聚合操作
  • 聚合操作使用 druid OnheapIncrementalIndex
  • persistExecutor线程池进行persist操作(仍然与index_realtime类似)
  • DataSegment的生成final DataSegment segment = JobHelper.serializeOutIndex(
public static DataSegment serializeOutIndex(
      final DataSegment segmentTemplate,
      final Configuration configuration,
      final Progressable progressable,
      final File mergedBase,
      final Path finalIndexZipFilePath,
      final Path finalDescriptorPath,
      final Path tmpPath,
      DataSegmentPusher dataSegmentPusher
  )
  
final DataSegment segment = JobHelper.serializeOutIndex(
            segmentTemplate,
            context.getConfiguration(),
            context,
            mergedBase,
            JobHelper.makeFileNamePath(
                new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
                outputFS,
                segmentTemplate,
                JobHelper.INDEX_ZIP,
                config.DATA_SEGMENT_PUSHER
            ),
            JobHelper.makeFileNamePath(
                new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
                outputFS,
                segmentTemplate,
                JobHelper.DESCRIPTOR_JSON,
                config.DATA_SEGMENT_PUSHER
            ),
            JobHelper.makeTmpPath(
                new Path(config.getSchema().getIOConfig().getSegmentOutputPath()),
                outputFS,
                segmentTemplate,
                context.getTaskAttemptID(),
                config.DATA_SEGMENT_PUSHER
            ),
            config.DATA_SEGMENT_PUSHER
        );
分片数的确定
  • 创建segment 使用到了ShardSpec,如果没有强制设置,则是前文估算出来的shard数量
// ShardSpec used for partitioning within this Hadoop job.
        final ShardSpec shardSpecForPartitioning = config.getShardSpec(bucket).getActualSpec();

        // ShardSpec to be published.
        final ShardSpec shardSpecForPublishing;
        if (config.isForceExtendableShardSpecs()) {
          shardSpecForPublishing = new NumberedShardSpec(shardSpecForPartitioning.getPartitionNum(), config.getShardSpecCount(bucket));
        } else {
          shardSpecForPublishing = shardSpecForPartitioning;
        }
segment 写入
  • segment文件组成
* descriptor.json
* index.zip
	* version.bin
	* meta.smooth
	* 0000.smooth
  • segment创建后,会输出segment文件,并删除一些临时文件
Path descriptorPath = config.makeDescriptorInfoPath(segment);
        descriptorPath = JobHelper.prependFSIfNullScheme(
            FileSystem.get(
                descriptorPath.toUri(),
                context.getConfiguration()
            ), descriptorPath
        );

        log.info("Writing descriptor to path[%s]", descriptorPath);
        JobHelper.writeSegmentDescriptor(
            config.makeDescriptorInfoDir().getFileSystem(context.getConfiguration()),
            segment,
            descriptorPath,
            context
        );
        for (File file : toMerge) {
          FileUtils.deleteDirectory(file);
        }
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值