一位开发人员每天跑相同的任务,但是他发现某一天同样的任务处理大约差不多量的数据(都是大约4亿条,hive count的结果),处理时间却相差3倍之多,一个不到一个小时,一个运行了三个小时,记录一下分析步骤。
1. 首先用explain看一下hive语句执行步骤再去找对应的job任务。 可以看出此sql语句分4个任务去跑。
Stage-1:执行join操作
Stage-2:根据Stage-1 执行count操作
Stage-0:数据移动到新表的操作。
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
m
TableScan
alias: m
Reduce Output Operator
key expressions:
expr: rtb_label
type: string
sort order: +
Map-reduce partition columns:
expr: rtb_label
type: string
tag: 0
value expressions:
expr: label_id
type: int
expr: extract
type: string
t:rtb_requests
TableScan
alias: rtb_requests
Filter Operator
predicate:
expr: (tid <> '')
type: boolean
Lateral View Forward
Select Operator
expressions:
expr: tid
type: string
expr: opxpid
type: string
expr: request_date_i
type: string
outputColumnNames: tid, opxpid, request_date_i
Lateral View Join Operator
outputColumnNames: _col12, _col16, _col40, _col42
Select Operator
expressions:
expr: _col12
type: string
expr: _col16
type: string
expr: _col42
type: string
outputColumnNames: _col0, _col1, _col2
Reduce Output Operator
key expressions:
expr: _col2
type: string
sort order: +
Map-reduce partition columns:
expr: _col2
type: string
tag: 1
value expressions:
expr: _col0
type: string
expr: _col1
type: string
Select Operator
expressions:
expr: array(detected_content_labels[0],CASE WHEN ((user_agent is null or (user_agent = ''))) THEN ('device_Undefined') WHEN (((user_agent rlike 'iP(a|ro)d') or (user_agent rlike 'Tablet'))) THEN ('device_Tablet') WHEN ((((user_agent rlike 'Kindle') or (user_agent rlike 'Mac.OS')) and (user_agent rlike 'Silk'))) THEN ('device_Tablet') WHEN ((((user_agent rlike 'HTC(.Flyer|\_Flyer)|ViewPad7|Dell.Streak.7|Next2|nook') and (user_agent rlike 'RUTEM/')) and (user_agent rlike 'Silk'))) THEN ('device_Tablet') WHEN ((((user_agent rlike 'Opera') and (user_agent rlike 'Windows.NT.5')) and (user_agent rlike 'HTC|Xda|Mini|Vario|SAMSUNG-GT-i8000|SAMSUNG-SGH-i9'))) THEN ('device_Mobile') WHEN ((((user_agent rlike 'Windows') and (not (user_agent rlike 'Phone'))) or (user_agent rlike 'Win(9|.9|NT)'))) THEN ('device_Computer') WHEN (((user_agent rlike 'Macintosh|PowerPC') and (not (user_agent rlike 'Silk')))) THEN ('device_Computer') WHEN ((user_agent rlike 'Android|Phone|Mobile')) THEN ('device_Mobile') WHEN (((user_agent rlike 'Linux') and (user_agent rlike 'X11'))) THEN ('device_Computer') WHEN ((user_agent rlike 'Solaris|SunOS|OpenBSD|FreeBSD')) THEN ('device_Computer') ELSE ('device_Other') END,CASE WHEN (((user_agent = '') or user_agent is null)) THEN ('browser_Undefined') WHEN ((user_agent rlike 'Chrome/[0-9]{1,2}')) THEN ('browser_Chrome') WHEN ((user_agent rlike 'MSIE [0-9]{1,2}.[0-9]+')) THEN ('browser_Internet Explorer') WHEN ((user_agent rlike 'Firefox/[0-9]{1,2}')) THEN ('browser_Firefox') WHEN ((user_agent rlike 'Opera Mini/[0-9]{1,2}')) THEN ('browser_Opera Mini') WHEN ((user_agent rlike 'Opera/[0-9]{1,2}')) THEN ('browser_Opera') WHEN (((user_agent rlike 'Version/[0-9]{1,2}') and (user_agent rlike 'Safari/[0-9]+'))) THEN ('browser_Safari') ELSE ('browser_Other') END,CASE WHEN ((user_agent is null or (user_agent = ''))) THEN ('os_Undefined') WHEN (((user_agent rlike 'iPad') or (user_agent rlike 'iPhone'))) THEN ('os_iOS') WHEN ((user_agent rlike 'Android')) THEN ('os_Android') WHEN (((user_agent rlike 'Windows') or (user_agent rlike 'Win(9|.9|NT)'))) THEN ('os_Windows') WHEN (((user_agent rlike 'Macintosh|PowerPC') and (user_agent rlike 'Mac OS'))) THEN ('os_Mac OS') WHEN (((user_agent rlike 'Linux') and (user_agent rlike 'X11'))) THEN ('os_Linux') WHEN ((user_agent rlike 'Solaris|SunOS|OpenBSD|FreeBSD')) THEN ('os_Other') ELSE ('os_Other') END,CASE WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0000') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0059'))) THEN ('00_01') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0100') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0159'))) THEN ('01_02') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0200') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0259'))) THEN ('02_03') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0300') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0359'))) THEN ('03_04') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0400') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0459'))) THEN ('04_05') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0500') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0559'))) THEN ('05_06') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0600') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0659'))) THEN ('06_07') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0700') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0759'))) THEN ('07_08') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0800') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0859'))) THEN ('08_09') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '0900') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '0959'))) THEN ('09_10') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1000') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1059'))) THEN ('10_11') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1100') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1159'))) THEN ('11_12') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1200') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1259'))) THEN ('12_13') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1300') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1359'))) THEN ('13_14') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1400') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1459'))) THEN ('14_15') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1500') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1559'))) THEN ('15_16') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1600') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1659'))) THEN ('16_17') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1700') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1759'))) THEN ('17_18') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1800') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1859'))) THEN ('18_19') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '1900') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '1959'))) THEN ('19_20') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '2000') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '2059'))) THEN ('20_21') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '2100') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '2159'))) THEN ('21_22') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '2200') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '2259'))) THEN ('22_23') WHEN (((from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') >= '2300') and (from_unixtime(unix_timestamp(UDFToString(request_time)), 'HHmm') <= '2359'))) THEN ('23_00') END,CASE WHEN (((country = 'China') and (city is not null and (city <> '')))) THEN (city) WHEN (((country = 'China') and ((city = '') or city is null))) THEN ('China') ELSE (country) END)
type: array<string>
outputColumnNames: _col0
UDTF Operator
function name: explode
Lateral View Join Operator
outputColumnNames: _col12, _col16, _col40, _col42
Select Operator
expressions:
expr: _col12
type: string
expr: _col16
type: string
expr: _col42
type: string
outputColumnNames: _col0, _col1, _col2
Reduce Output Operator
key expressions:
expr: _col2
type: string
sort order: +
Map-reduce partition columns:
expr: _col2
type: string
tag: 1
value expressions:
expr: _col0
type: string
expr: _col1
type: string
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col1} {VALUE._col2}
1 {VALUE._col0} {VALUE._col1}
handleSkewJoin: false
outputColumnNames: _col1, _col2, _col5, _col6
Select Operator
expressions:
expr: _col5
type: string
expr: _col6
type: string
expr: _col1
type: int
expr: _col2
type: string
outputColumnNames: _col5, _col6, _col1, _col2
Group By Operator
aggregations:
expr: count(_col1)
bucketGroup: false
keys:
expr: _col5
type: string
expr: _col6
type: string
expr: _col1
type: int
expr: _col2
type: string
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
maprfs:/user/hdbatch/tmp/hive/hive_2013-10-12_15-25-43_516_746550569391015319/-mr-10002
Reduce Output Operator
key expressions:
expr: _col0
type: string
expr: _col1
type: string
expr: _col2
type: int
expr: _col3
type: string
sort order: ++++
Map-reduce partition columns:
expr: _col0
type: string
expr: _col1
type: string
expr: _col2
type: int
expr: _col3
type: string
tag: -1
value expressions:
expr: _col4
type: bigint
Reduce Operator Tree:
Group By Operator
aggregations:
expr: count(VALUE._col0)
bucketGroup: false
keys:
expr: KEY._col0
type: string
expr: KEY._col1
type: string
expr: KEY._col2
type: int
expr: KEY._col3
type: string
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Select Operator
expressions:
expr: _col0
type: string
expr: _col2
type: int
expr: _col4
type: bigint
expr: _col3
type: string
expr: _col1
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: int
expr: UDFToInteger(_col2)
type: int
expr: _col3
type: string
expr: _col4
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4
File Output Operator
compressed: false
GlobalTableId: 1
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: tanx_rtb.cookie_labels
Stage: Stage-0
Move Operator
tables:
partition:
request_date_i 20131010
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: tanx_rtb.cookie_labels
Stage: Stage-3
Stats-Aggr Operator
2. 去hadoop监控页面查找任务。可以看出两者运行时间相差3倍以上
Stage2:
Stage6:
3. 分析了一下Stage6和stage2的任务:可以看出map input的数据根本就不一样,相差3倍之多,这就很奇怪了,因为我们用hive count 得的结果条数基本一致,为什么map input 输入就相差这么多呢?
任务1:
任务2:
4.接着分析去查找这两个任务的map input 输入路径:发现两者的数据路径任务一路径显然多,而任务二显然少,于是去HDFS查找发现 MR的input path 和HDFS的数据不一样(有延迟数据).