语义解析主要是把AST Tree转化为QueryBlock,那为什么要转成QueryBlock呢?从之前的分析,我们可以看到AST Tree 还是很抽象,并且也不携带表、字段相关的信息,进行语义解析可以将AST Tree分模块存入QueryBlock 并携带对应的元数据信息,为生成逻辑执行计划做准备
简单串一下语义解析
sql编译器的入口:
BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, tree);
List<HiveSemanticAnalyzerHook> saHooks =
getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK,
HiveSemanticAnalyzerHook.class);
// Flush the metastore cache. This assures that we don't pick up objects from a previous
// query running in this same thread. This has to be done after we get our semantic
// analyzer (this is when the connection to the metastore is made) but before we analyze,
// because at that point we need access to the objects.
Hive.get().getMSC().flushCache();
// Do semantic analysis and plan generation
if (saHooks != null && !saHooks.isEmpty()) { //hive的hook机制,在hook中实现一些方法来对语句做预判
HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
hookCtx.setConf(conf);
hookCtx.setUserName(userName);
hookCtx.setIpAddress(SessionState.get().getUserIpAddress());
hookCtx.setCommand(command);
for (HiveSemanticAnalyzerHook hook : saHooks) {
tree = hook.preAnalyze(hookCtx, tree);
}
sem.analyze(tree, ctx);
hookCtx.update(sem);
for (HiveSemanticAnalyzerHook hook : saHooks) {
hook.postAnalyze(hookCtx, sem.getAllRootTasks());
}
} else {
sem.analyze(tree, ctx); //直接进入编译
}
进入sql编译之前,先判断是不是设置了hive.semantic.analyzer.hook参数,这个是hive的hook机制,在hook中实现一些方法来对语句做预判,具体做法是实现HiveSemanticAnalyzerHook接口,preAnalyze方法和postAnalyze方法会分别在编译之前和之后执行。 在这里,我们更关心编译模块sem.analyze(tree, ctx)。
sem是由BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, tree) 获取。这里用到了java设计模式中的工厂模式:
public static BaseSemanticAnalyzer get(QueryState queryState, ASTNode tree)
throws SemanticException {
if (tree.getToken() == null) {
throw new RuntimeException("Empty Syntax Tree");
} else {
HiveOperation opType = commandType.get(tree.getType());
queryState.setCommandType(opType);
switch (tree.getType()) {
case HiveParser.TOK_EXPLAIN:
return new ExplainSemanticAnalyzer(queryState);
case HiveParser.TOK_EXPLAIN_SQ_REWRITE:
return new ExplainSQRewriteSemanticAnalyzer(queryState);
case HiveParser.TOK_LOAD:
return new LoadSemanticAnalyzer(queryState);
case HiveParser.TOK_EXPORT:
return new ExportSemanticAnalyzer(q