如下两个SQL语句,一般意义上大家都认为第二个执行更快,执行完大家就会发现第一个耗时149s,第二个耗时160s,可以说两个的执行计划近乎相同,有兴趣的同学可以看下下面的语法树。 注:等值连接,两者没什么差别;但是外连接,只有第二种能保证查询结果正确,因为外连接先执行join操作,reduce时候执行where操作 hql1: hive -e "use acorn_3g; explain select t.app_id,count(distinct t.uid) from tmp_user_info t join tmp_user_info n on t.uid = n.uid where t.log_date='2013-11-05' and n.log_date='2013-12-05' group by t.app_id;" hql2: hive -e "use acorn_3g; explain select t.app_id,count(distinct t.uid) from ( select app_id,uid from tmp_user_info where log_date='2013-11-05') t join ( select uid from tmp_user_info where log_date='2013-12-05') n on t.uid = n.uid group by t.app_id;" hq1抽象语法树: ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME tmp_user_info) t ) (TOK_TABREF (TOK_TABNAME tmp_user_info) n ) ( = (. (TOK_TABLE_OR_COL t) uid) (. (TOK_TABLE_OR_COL n) uid) ) ) ) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE) ) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL t) app_id) ) (TOK_SELEXPR (TOK_FUNCTIONDI count (. (TOK_TABLE_OR_COL t) uid) ) ) ) (TOK_WHERE ( and (= (. (TOK_TABLE_OR_COL t) log_date) '2013-11-05') (= (. (TOK_TABLE_OR_COL n) log_date) '2013-12-05') )) (TOK_GROUPBY (. (TOK_TABLE_OR_COL t) app_id)) ) ) hq1计划引擎 STAGE DEPENDENCIES: Stage-6 is a root stage , consists of Stage-1 Stage-1 Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-6 Conditional Operator Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: n TableScan //扫描表 alias: n //表的别名 Reduce Output Operator //输出key,value 给reduce (uid,null) key expressions: expr: uid type: bigint sort order: + Map-reduce partition columns: //Hadoop partition expr: uid type: bigint tag: 1 //表标签1 t TableScan //扫描表 alias: t //表的别名 Reduce Output Operator //输出key,value给reduce key:uid value:uid,app_id key expressions: expr: uid type: bigint sort order: + Map-reduce partition columns: //hadoop partition expr: uid type: bigint tag: 0 value expressions: expr: uid type: bigint expr: app_id type: string Reduce Operator Tree: //reduce join 操作 Join Operator condition map: Inner Join 0 to 1 //表 n join 表 t condition expressions: 0 {VALUE._col1} {VALUE._col3} //表0 uid,app_id 1 handleSkewJoin: false outputColumnNames: _col1, _col3 Select Operator //选择字段 expressions: expr: _col3 //n.app_id type: string expr: _col1 //n.uid type: bigint outputColumnNames: _col3, _col1 //map输出 Group By Operator //在第一次MR reduce阶段先做一次聚合 aggregations: expr: count(DISTINCT _col1) //聚合函数 bucketGroup: false keys: expr: _col3 //app_id type: string expr: _col1 //uid type: bigint mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator //输出到文件,第一次MR落地 compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 //MR 第二阶段 Map Reduce Alias -> Map Operator Tree: hdfs://YZSJHL18-25.opi.com/tmp/hive-dp/hive_2013-12-09_21-06-05_895_5853839692383923149/-mr-10002 Reduce Output Operator key expressions: expr: _col0 //app_id type: string expr: _col1 //uid type: bigint sort order: ++ Map-reduce partition columns: //hadoop partition app_id expr: _col0 type: string tag: -1 value expressions: expr: _col2 type: bigint Reduce Operator Tree: //在reduce端聚合 Group By Operator aggregations: expr: count(DISTINCT KEY._col1:0._col0) //聚合函数,distinct 操作 uid,app_id作为联合key bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator //输出到文件,任务结束 compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 hql2计划引擎: STAGE DEPENDENCIES: Stage-6 is a root stage , consists of Stage-1 Stage-1 Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-6 Conditional Operator Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: n:tmp_user_info //表别名 TableScan //扫描表 alias: tmp_user_info Select Operator //选择字段 expressions: expr: uid type: bigint outputColumnNames: _col0 Reduce Output Operator //输出key,value给reduce (uid,uid) key expressions: expr: _col0 type: bigint sort order: + Map-reduce partition columns: expr: _col0 type: bigint tag: 1 //表标签1 t:tmp_user_info //表别名 TableScan //扫描表 alias: tmp_user_info Select Operator expressions: expr: app_id type: string expr: uid type: bigint outputColumnNames: _col0, _col1 //map输出 app_id,uid Reduce Output Operator //key:uid value:uid,app_id key expressions: expr: _col1 type: bigint sort order: + Map-reduce partition columns: //hadoop partition expr: _col1 type: bigint tag: 0 //表标签0 value expressions: expr: _col0 type: string expr: _col1 type: bigint Reduce Operator Tree: //reduce端join Join Operator condition map: //common reduce join Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 handleSkewJoin: false outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Group By Operator aggregations: expr: count(DISTINCT _col1) bucketGroup: false keys: expr: _col0 type: string expr: _col1 type: bigint mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: hdfs://YZSJHL18-25.opi.com/tmp/hive-dp/hive_2013-12-09_21-05-50_461_737142336422281010/-mr-10002 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: bigint sort order: ++ Map-reduce partition columns: expr: _col0 type: string tag: -1 value expressions: expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain |
hive sql执行计划树解析
最新推荐文章于 2024-04-19 10:23:56 发布