Hive SQL的执行流程分析
1)SQL执行非常慢
2)面试
select yyy,聚合函数 from xxx group by yyy;
select a.*,b.* from a join b on a.id = b.id;
如果给你一个join,让你使用MR的功能来描述.
例子1:
select a.id,a.city,a.cate from access a where a.day = "20190414" and a.cate="大奔"
例子2:
select city ,count(1) from access a where day = "20190414" and cate = "大奔"
group by city
mr count
1)map:split ==> (word,1_)
2)shuffle:(word,1) partition ==> reduce
3)reduce:(word ,可迭代的(1,1,....1) ==> (word,sum(可迭代的))
例子2:
map:split ==> (city,1)
combiner:local reduce 局部聚合,减少shuffle
使用场景: