hive map side join

最新推荐文章于 2023-06-03 16:20:50 发布

Cccrab

最新推荐文章于 2023-06-03 16:20:50 发布

阅读量475

点赞数

If all but one of the tables being joined are small, the join can be performed as a map only job. The query

 
           SELECT  
           /*+ MAPJOIN(b) */  
           a.key, a.value 
          
           FROM a JOIN b ON a.key = b.key

does not need a reducer. For every mapper of A, B is read completely. The restriction is that a FULL/RIGHT OUTER JOIN b cannot be performed.

如果需要join的表中存在某些个小表，则可以使用map side join,这样的话，这次的join可以优化为仅运行map job,不需要再运行reduce job.这样使用存在的限制是不能支持 full/right outer join b.

类似于，先把小表缓存起来(内存中)，然后使用缓存起来的小表和大表做关联，如：

step 1:

从HDFS读取小表的数据到内存中（可以只读取小表的key列）

step 2:

在map端：

for(大表.row){

for(小表.row){

if(大表.key==小表.key){ out(大表.row)}

}

//由此，无法做到right outer join 或full outer join,因为只有map,输出的只有大表的row.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注