1、列裁剪、分区裁剪
在查询的过程中减少不必要的分区和列,例如:
select * from shuidi_dwb.dwb_cf_case_info_full_d
应改为:
select case_id,ckr_id from shuidi_dwb.dwb_cf_case_info_full_d where dt='2019-08-28';
2、尽早尽量过滤数据,减少每个阶段的数据量
在多次关联的时候,尽量在每个自查询中(关联前)加上筛选(where)条件以减少下阶段job的数据量。
优化前:SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)
WHERE a.ds='2009-07-07' AND b.ds='2009-07-07'
优化后:SELECT a.val, b.val FROM
(select key,val from a where a.ds=‘2009-07-07’ ) x LEFT OUTER JOIN
(select key,val from b where b.ds=‘2009-07-07’ ) y ON x.key=y.key
3、:善用multi-insert:
#查询了两次a
insert overwrite table tmp1
select ... from a where 条件1;
insert overwrite table