语义解析主要是把AST Tree转化为QueryBlock,那为什么要转成QueryBlock呢?从之前的分析,我们可以看到AST Tree 还是很抽象,并且也不携带表、字段相关的信息,进行语义解析可以将AST Tree分模块存入QueryBlock 并携带对应的元数据信息,为生成逻辑执行计划做准备
简单串一下语义解析
sql编译器的入口:
BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, tree); List saHooks = getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK, HiveSemanticAnalyzerHook.class); // Flush the metastore cache. This assures that we don't pick up objects from a previous // query running in this same thread. This has to be done after we get our semantic // analyzer (this is when the connection to the metastore is made) but before we analyze, // because at that point we need access to the objects. Hive.get().getMSC().flushCache(); // Do semantic analysis and plan generation if (saHooks != null && !saHooks.isEmpty()) { //hive的hook机制,在hook中实现一些方法来对语句做预判 HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl(); hookCtx.setConf(conf); hookCtx.setUserName(userName); hookCtx.setIpAddress(SessionState.get().getUserIpAddress()); hookCtx.setCommand(command); for (HiveSemanticAnalyzerHook hook : saHooks) {
tree = hook.preAnalyze(hookCtx, tree); } sem.analyze(tree, ctx); hookCtx.update(sem); for (HiveSemanticAnalyzerHook hook : saHooks) {
hook.postAnalyze(hookCtx, sem.getAllRootTasks()); } } else {
sem.analyze(tree, ctx); //直接进入编译 }
进入sql编译之前,先判断是不是设置了hive.semantic.analyzer.hook参数,这个是hive的hook机制,在hook中实现一些方法来对语句做预判,具体做法是实现HiveSemanticAnalyzerHook接口,preAnalyze方法和postAnalyze方法会分别在编译之前和之后执行。 在这里,我们更关心编译模块sem.analyze(tree, ctx)。
sem是由BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, tree) 获取。这里用到了java设计模式中的工厂模式:
public static BaseSemanticAnalyzer get(QueryState queryState, ASTNode tree) throws SemanticException {
if (tree.getToken() == null) {
throw new RuntimeException("Empty Syntax Tree"); } else {
HiveOperation opType = commandType.get(tree.getType()); queryState.setCommandType(opType); switch (tree.getType()) {
case HiveParser.TOK_E