hive sql执行计划树解析

最新推荐文章于 2024-04-19 10:23:56 发布

仰望星空的我

最新推荐文章于 2024-04-19 10:23:56 发布

阅读量897

点赞数

分类专栏： hive etl

hive etl 专栏收录该内容

42 篇文章 0 订阅

订阅专栏

如下两个SQL语句，一般意义上大家都认为第二个执行更快，执行完大家就会发现第一个耗时149s，第二个耗时160s,可以说两个的执行计划近乎相同，有兴趣的同学可以看下下面的语法树。

注：等值连接，两者没什么差别；但是外连接，只有第二种能保证查询结果正确，因为外连接先执行join操作，reduce时候执行where操作

hql1:

hive -e "use acorn_3g;

explain select t.app_id,count(distinct t.uid)

from tmp_user_info t

join tmp_user_info n

on t.uid = n.uid

where t.log_date='2013-11-05' and n.log_date='2013-12-05'

group by t.app_id;"

hql2：

hive -e "use acorn_3g;

explain select t.app_id,count(distinct t.uid)

from ( select app_id,uid

from tmp_user_info

where log_date='2013-11-05') t

join ( select uid

from tmp_user_info

where log_date='2013-12-05') n

on t.uid = n.uid group by t.app_id;"

hq1抽象语法树：

ABSTRACT SYNTAX TREE:

(TOK_QUERY

(TOK_FROM

(TOK_JOIN

(TOK_TABREF

(TOK_TABNAME tmp_user_info) t

)

(TOK_TABREF

(TOK_TABNAME tmp_user_info) n

)

( =

(. (TOK_TABLE_OR_COL t) uid)

(. (TOK_TABLE_OR_COL n) uid)

)

(TOK_INSERT

(TOK_DESTINATION

(TOK_DIR TOK_TMP_FILE)

)

(TOK_SELECT

(TOK_SELEXPR

(. (TOK_TABLE_OR_COL t) app_id)

)

(TOK_SELEXPR

(TOK_FUNCTIONDI count

(. (TOK_TABLE_OR_COL t) uid)

)

(TOK_WHERE

( and

(= (. (TOK_TABLE_OR_COL t) log_date) '2013-11-05')

(= (. (TOK_TABLE_OR_COL n) log_date) '2013-12-05')

))

(TOK_GROUPBY (. (TOK_TABLE_OR_COL t) app_id))

)

hq1计划引擎

STAGE DEPENDENCIES:

Stage-6 is a root stage , consists of Stage-1

Stage-1

Stage-2 depends on stages: Stage-1

Stage-0 is a root stage

STAGE PLANS:

Stage: Stage-6

Conditional Operator

Stage: Stage-1

Map Reduce

Alias -> Map Operator Tree:

TableScan //扫描表

alias: n //表的别名

Reduce Output Operator //输出key,value 给reduce (uid,null)

key expressions:

expr: uid

type: bigint

sort order: +

Map-reduce partition columns: //Hadoop partition

expr: uid

type: bigint

tag: 1 //表标签1

TableScan //扫描表

alias: t //表的别名

Reduce Output Operator //输出key，value给reduce key:uid value:uid,app_id

key expressions:

expr: uid

type: bigint

sort order: +

Map-reduce partition columns: //hadoop partition

expr: uid

type: bigint

tag: 0

value expressions:

expr: uid

type: bigint

expr: app_id

type: string

Reduce Operator Tree: //reduce join 操作

Join Operator

condition map:

Inner Join 0 to 1 //表 n join 表 t

condition expressions:

0 {VALUE._col1} {VALUE._col3} //表0 uid,app_id

handleSkewJoin: false

outputColumnNames: _col1, _col3

Select Operator //选择字段

expressions:

expr: _col3 //n.app_id

type: string

expr: _col1 //n.uid

type: bigint

outputColumnNames: _col3, _col1 //map输出

Group By Operator //在第一次MR reduce阶段先做一次聚合

aggregations:

expr: count(DISTINCT _col1) //聚合函数

bucketGroup: false

keys:

expr: _col3 //app_id

type: string

expr: _col1 //uid

type: bigint

mode: hash

outputColumnNames: _col0, _col1, _col2

File Output Operator //输出到文件，第一次MR落地

compressed: false

GlobalTableId: 0

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

Stage: Stage-2 //MR 第二阶段

Map Reduce

Alias -> Map Operator Tree:

hdfs://YZSJHL18-25.opi.com/tmp/hive-dp/hive_2013-12-09_21-06-05_895_5853839692383923149/-mr-10002

Reduce Output Operator

key expressions:

expr: _col0 //app_id

type: string

expr: _col1 //uid

type: bigint

sort order: ++

Map-reduce partition columns: //hadoop partition app_id

expr: _col0

type: string

tag: -1

value expressions:

expr: _col2

type: bigint

Reduce Operator Tree: //在reduce端聚合

Group By Operator

aggregations:

expr: count(DISTINCT KEY._col1:0._col0) //聚合函数,distinct 操作 uid,app_id作为联合key

bucketGroup: false

keys:

expr: KEY._col0

type: string

mode: mergepartial

outputColumnNames: _col0, _col1

Select Operator

expressions:

expr: _col0

type: string

expr: _col1

type: bigint

outputColumnNames: _col0, _col1

File Output Operator //输出到文件,任务结束

compressed: false

GlobalTableId: 0

table:

input format: org.apache.hadoop.mapred.TextInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Stage: Stage-0

Fetch Operator

limit: -1

hql2计划引擎：

STAGE DEPENDENCIES:

Stage-6 is a root stage , consists of Stage-1

Stage-1

Stage-2 depends on stages: Stage-1

Stage-0 is a root stage

STAGE PLANS:

Stage: Stage-6

Conditional Operator

Stage: Stage-1

Map Reduce

Alias -> Map Operator Tree:

n:tmp_user_info //表别名

TableScan //扫描表

alias: tmp_user_info

Select Operator //选择字段

expressions:

expr: uid

type: bigint

outputColumnNames: _col0

Reduce Output Operator //输出key,value给reduce (uid,uid)

key expressions:

expr: _col0

type: bigint

sort order: +

Map-reduce partition columns:

expr: _col0

type: bigint

tag: 1 //表标签1

t:tmp_user_info //表别名

TableScan //扫描表

alias: tmp_user_info

Select Operator

expressions:

expr: app_id

type: string

expr: uid

type: bigint

outputColumnNames: _col0, _col1 //map输出 app_id,uid

Reduce Output Operator //key:uid value:uid,app_id

key expressions:

expr: _col1

type: bigint

sort order: +

Map-reduce partition columns: //hadoop partition

expr: _col1

type: bigint

tag: 0 //表标签0

value expressions:

expr: _col0

type: string

expr: _col1

type: bigint

Reduce Operator Tree: //reduce端join

Join Operator

condition map: //common reduce join

Inner Join 0 to 1

condition expressions:

0 {VALUE._col0} {VALUE._col1}

handleSkewJoin: false

outputColumnNames: _col0, _col1

Select Operator

expressions:

expr: _col0

type: string

expr: _col1

type: bigint

outputColumnNames: _col0, _col1

Group By Operator

aggregations:

expr: count(DISTINCT _col1)

bucketGroup: false

keys:

expr: _col0

type: string

expr: _col1

type: bigint

mode: hash

outputColumnNames: _col0, _col1, _col2

File Output Operator

compressed: false

GlobalTableId: 0

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

Stage: Stage-2

Map Reduce

Alias -> Map Operator Tree:

hdfs://YZSJHL18-25.opi.com/tmp/hive-dp/hive_2013-12-09_21-05-50_461_737142336422281010/-mr-10002

Reduce Output Operator

key expressions:

expr: _col0

type: string

expr: _col1

type: bigint

sort order: ++

Map-reduce partition columns:

expr: _col0

type: string

tag: -1

value expressions:

expr: _col2

type: bigint

Reduce Operator Tree:

Group By Operator

aggregations:

expr: count(DISTINCT KEY._col1:0._col0)

bucketGroup: false

keys:

expr: KEY._col0

type: string

mode: mergepartial

outputColumnNames: _col0, _col1

Select Operator

expressions:

expr: _col0

type: string

expr: _col1

type: bigint

outputColumnNames: _col0, _col1

File Output Operator

compressed: false

GlobalTableId: 0

table:

input format: org.apache.hadoop.mapred.TextInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Stage: Stage-0

Fetch Operator

limit: -1

LanguageManual Explain

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

仰望星空的我

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive sql执行计划树解析

如下两个SQL语句，一般意义上大家都认为第二个执行更快，执行完大家就会发现第一个耗时149s，第二个耗时160s,可以说两个的执行计划近乎相同，有兴趣的同学可以看下下面的语法树。注：等值连接，两者没什么差别；但是外连接，只有第二种能保证查询结果正确，因为外连接先执行join操作，reduce时候执行where操作 hql1: hive -e "use acorn_...
复制链接

扫一扫