spark 学习 chapter4 Working wih key/value Pairs

最新推荐文章于 2021-07-28 16:17:27 发布

银灯玉箫

最新推荐文章于 2021-07-28 16:17:27 发布

阅读量143

点赞数

分类专栏： Java

本文链接：https://blog.csdn.net/lilele12211104/article/details/97896270

版权

Java 专栏收录该内容

26 篇文章 0 订阅

订阅专栏

RDDs containing key/value pairs are called pair RDDs.
在这里插入图片描述

在这里插入图片描述

Aggregation

在这里插入图片描述

combineByKey()

在这里插入图片描述
The proceduer is as follows.

Tuning the level of parallelism

 sc.parallelize(data).reduceByKey((x,y) => x+y , 10).partitions.size

在这里插入图片描述

Grouping Data

在这里插入图片描述
In additon to grouping data from a single RDD, we can group data sharing the same key from multiple RDDs using a function called cogroup(). cogroup() over two RDDs sharing the same key type, k, with the respective value types gives us back RDD[(K, (Iterable[V],Iterable[W]))]. If one of the RDDs doesn’t have elements for a given key that is present in the other RDD, the corresponding Iterable is simply empty. cogroup() gives us the power to group data from multiple RDDs.
cogroup() is used as a building block for the joins.

Joins

在这里插入图片描述
leftOuterJoin()
rightOuterJoin()

Sorting data

在这里插入图片描述

Action Available on Pair RDDs

在这里插入图片描述

Determining an RDD’s Partitioner

在这里插入图片描述

Operations That Affect Partitioning

在这里插入图片描述

Example: PageRank

在这里插入图片描述

Tip
To maximize the potential for partitioning-related optimizations, you should use mapValues() or flatMapValues() whenever you are not changing an element’s key.

银灯玉箫

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark 学习 chapter4 Working wih key/value Pairs

RDDs containing key/value pairs are called pair RDDs.AggregationcombineByKey()The proceduer is as follows.Tuning the level of parallelism sc.parallelize(data).reduceByKey((x,y) => x...
复制链接

扫一扫

专栏目录