select
a.userkey,
a.idno,
a.phone,
a.name,
b.user_active_at,
c.intend_commodity,
c.intend_rank,
d.order_num,
d.order_amount
from user_info a
leftjoin user_active b on a.userkey = b.userkey
leftjoin user_intend c on a.phone = c.phone
leftjoin user_order d on a.idno = d.idno;
执行本条sql后,任务在reduce阶段一直卡顿在99%,如图:
出现这种情况,我们便需要考虑数据倾斜。当然,还有一种情况是任务执行超时,当Reduce 处理的数据量巨大,在做 full gc 的时候,stop the world。导致响应超时,超出默认的 600 秒,任务被杀掉。报错信息一般如下:
select
a.userkey,
a.idno,
a.phone,
a.name,
b.user_active_at,
c.intend_commodity,
c.intend_rank,
d.order_num,
d.order_amount
from user_info a
leftjoin user_active b on a.userkey = b.userkey
leftjoin user_intend c on a.phone = c.phone
leftjoin user_order d on a.idno = d.idno where a.idno isnotnull;