</pre> 因为日志急速增长,原来放在Mysql上的统计 越来越吃力,所以公司决定把统计业务迁移到Hadoop上。<p></p><p>在比对数据的时候,发现了Hive中的一个坑</p><p></p><pre name="code" class="sql">select a.* from default.t_softuser a
left join
t_softuser b on
a.hid=b.hid and a.corp=b.corp and a.softid=b.softid and a.statdate='2015-01-27' and b.statdate='2015-01-27'
where b.hid is null
Explain
STAGE DEPENDENCIES:
Stage-4 is a root stage , consists of Stage-1
Stage-1
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-4
Conditional Operator
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: a
Statistics: <span style="color:#FF6666;">Num rows: 878083830 Data size: 60095442437 Basic stats: PARTIAL Column stats: NONE</span>
Reduce Output Operator
key expressions: hid (type: string), corp (type: string), softid (type: int)
sort order: +++
Map-reduce partition columns: hid (type: string), corp (type: string), softid (type: int)
Statistics: Num rows: 878083830 Data size: 60095442437 Basic stats: COMPLETE Column stats: NONE
value expressions: hid (type: string), corp (type: string), softid (type: int), install_time (type: string), lastvisit_time (type: string), active_day (type: int), state (type: int), statdate (type: string)
TableSc