Kylin Cube Build的接口说明
- 每一个Cube需要设置数据源、计算引擎和存储引擎,工厂类负责创建数据源对象、计算引擎对象和存储引擎对象
- 三者之间通过适配器进行串联
数据源接口(ISource)
public interface ISource extends Closeable {
// 同步数据源中表的元数据信息
ISourceMetadataExplorer getSourceMetadataExplorer();
// 适配制定的构建引擎接口
<I> I adaptToBuildEngine(Class<I> engineInterface);
// 顺序读取表
IReadableTable createReadableTable(TableDesc tableDesc);
// 构建之前丰富数据源的Partition
SourcePartition enrichSourcePartitionBeforeBuild(IBuildable buildable, SourcePartition srcPartition);
}
存储引擎接口(IStorage)
public interface IStorage {
// 创建一个查询指定Cube的对象
public IStorageQuery createQuery(IRealization realization);
public <I> I adaptToBuildEngine(Class<I> engineInterface);
}
计算引擎接口(IBatchCubingEngine)
public interface IBatchCubingEngine {
public IJoinedFlatTableDesc getJoinedFlatTableDesc(CubeSegment newSegment);
// 返回一个工作流计划, 用以构建指定的CubeSegment
public DefaultChainedExecutable createBatchCubingJob(CubeSegment newSegment, String submitter);
// 返回一个工作流计划, 用以合并指定的CubeSegment
public DefaultChainedExecutable createBatchMergeJob(CubeSegment mergeSegment, String submitter);
// 返回一个工作流计划, 用以优化指定的CubeSegment
public DefaultChainedExecutable createBatchOptimizeJob(CubeSegment optimizeSegment, String submitter);
public Class<?> getSourceInterface();
public Class<?> getStorageInterface();
}
离线Cube Build 调用链
- Rest API请求
/{cubeName}/rebuild
, 调用CubeController.rebuild()
方法 ->jobService.submitJob()
- Project级别的权限校验:
aclEvaluate.checkProjectOperationPermission(cube);
ISource source = SourceManager.getSource(cube)
根据CubeInstance的方法getSourceType()
的返回值决定ISource的对象类型public int getSourceType() { return getModel().getRootFactTable().getTableDesc().getSourceType(); }
分配新的segment:
newSeg = getCubeManager().appendSegment(cube, src);
EngineFactory根据Cube定义的engine type, 创建对应的
IBatchCubingEngine
对象 -> 调用createBatchCubingJob()
方法创建作业链,MRBatchCubingEngine2
新建的是BatchCubingJobBuilder2
public BatchCubingJobBuilder2(CubeSegment newSegment, String submitter) { super(newSegment, submitter); this.inputSide = MRUtil.getBatchCubingInputSide(seg); this.outputSide = MRUtil.getBatchCubingOutputSide2(seg); }
适配输入数据源到构建引擎
SourceManager.createEngineAdapter(seg, IMRInput.class).getBatchCubingInputSide(flatDesc); public static <T> T createEngineAdapter(ISourceAware table, Class<T> engineInterface) { return getSource(table).adaptToBuildEngine(engineInterface); } // HiveSource返回的是HiveMRInput public <I> I adaptToBuildEngine(Class<I> engineInterface) { if (engineInterface == IMRInput.class) { return (I) new HiveMRInput(); } else { throw new RuntimeException("Cannot adapt to " + engineInterface); } }
适配存储引擎到构建引擎
StorageFactory.createEngineAdapter(seg, IMROutput2.class).getBatchCubingOutputSide(seg); public static <T> T createEngineAdapter(IStorageAware aware, Class<T> engineInterface) { return storage(aware).adaptToBuildEngine(engineInterface); } // HBaseStorage返回的是HBaseMROutput2Transition public <I> I adaptToBuildEngine(Class<I> engineInterface) { if (engineInterface == IMROutput2.class) { return (I) new HBaseMROutput2Transition(); } else { throw new RuntimeException("Cannot adapt to " + engineInterface); } }
- 适配成功后,
new BatchCubingJobBuilder2(newSegment, submitter).build()
该方法创建具体的执行步骤, 形成工作流 - 将工作流添加到执行管理器,等待调度执行:
getExecutableManager().addJob(job);