Hive在做完语义分析后,会把查询语句的逻辑转化成一个由operator构成的DAG。但是这个DAG不能完全对应于Hadoop的计算框架,还需要根据Hadoop的框架要求,进一步的切割剪裁才行,就是封装成对应的Task对象。
切割这个DAG的逻辑在SemanticAnalyse.java中的genMapRedTasks方法里,核心代码如下:
Map<Rule, NodeProcessor> opRules = new LinkedHashMap<Rule, NodeProcessor>();
opRules.put(new RuleRegExp(new String("R1"), "TS%"), new GenMRTableScan1());
opRules.put(new RuleRegExp(new String("R2"), "TS%.*RS%"),
new GenMRRedSink1());
opRules.put(new RuleRegExp(new String("R3"), "RS%.*RS%"),
new GenMRRedSink2());
opRules.put(new RuleRegExp(new String("R4"), "FS%"), new GenMRFileSink1());
opRules.put(new RuleRegExp(new String("R5"), "UNION%"), new GenMRUnion1());
opRules.put(new RuleRegExp(new String("R6"), "UNION%.*RS%"),
new GenMRRedSink3());
opRules.put(new RuleRegExp(new String("R6"), "MAPJOIN%.*RS%"),
new GenMRRedSink4());
opRules.put(new RuleRegExp(new String("R7"), "TS%.*MAPJOIN%"),
MapJoinFactory.getTableScanMapJoin());
opRules.put(new RuleRegExp(new String("R8"), "RS%.*MAPJOIN%"),
MapJoinFactory.getReduceSinkMapJoin());
opRules.put(new RuleRegExp(new String("R9"), "UNION%.*MAPJOIN%"),
MapJoinFactory.getUnionMapJoin());
opRules.put(new RuleRegExp(new String("R10"), "MAPJOIN%.*MAPJOIN%"),
MapJoinFactory.getMapJoinMapJoin());
opRules.put(new RuleRegExp(new String("R11"), "MAPJOIN%SEL%"),
MapJoinFactory.getMapJoin());
如上代码所示,每一个规