Cloud Computing(6)_Processing Relational Data

最新推荐文章于 2022-09-17 19:44:52 发布

夏大兔

最新推荐文章于 2022-09-17 19:44:52 发布

阅读量377

点赞数

分类专栏：云计算

本文链接：https://blog.csdn.net/xiayiqian71/article/details/63253526

版权

云计算专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Join Algorithms in MapReduce

Reduce-Side Join
Map-Side Join
Memory-Backed join

Reduce-Side Join

we map over both datasets and emit the join key as the intermediate key, and the tuple itself as the intermediate value. Since MapReduce guarantees that all values with the same key are brought together, all tuples will be grouped by the join key|which is exactly what we need to perform the join operation.

The approach isn’t particularly efficient since it requires shuffling both datasets across the network.

Map-Side Join

we map over one of the datasets (the larger one) and inside the mapper read the corresponding part of the other dataset to perform the merge join.

Memory-Backed Join

we can load the smaller dataset into memory in every mapper, populating an associative array to facilitate random access to tuples based on the join key.

Which Join to use?

Memory-Backed Join > Map-Side Join > Reduce-Side Join

夏大兔

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Cloud Computing(6)_Processing Relational Data

Join Algorithms in MapReduceReduce-Side JoinMap-Side JoinMemory-Backed join
复制链接

扫一扫

专栏目录