【Spark Java API】Transformation(13)—zipWithIndex、zipWithUniqueId

最新推荐文章于 2022-07-21 15:25:56 发布

小飞_侠

最新推荐文章于 2022-07-21 15:25:56 发布

阅读量1.7k

点赞数

分类专栏： spark 文章标签： spark

本文链接：https://blog.csdn.net/a6210575/article/details/52260364

版权

本文深入探讨了Spark的两个转换操作：zipWithIndex和zipWithUniqueId。zipWithIndex将RDD元素与其索引结合，而zipWithUniqueId则为每个元素生成唯一的ID，基于分区和元素位置。内容包括官方文档描述、函数原型、源码分析及实例演示。

摘要由CSDN通过智能技术生成

zipWithIndex

官方文档描述：

Zips this RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the largest index. This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type.This method needs to trigger a spark job when this RDD contains more than one partitions.

函数原型：

def zipWithIndex(): JavaPairRDD[T, JLong]

该函数将RDD中的元素和这个元素在RDD中的indices组合起来，形成键/值对的RDD。

源码分析：

def zipWithIndex(): RDD[(T, Long)] = withSco

最低0.47元/天解锁文章

小飞_侠

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【Spark Java API】Transformation(13)—zipWithIndex、zipWithUniqueId

spark java api...
复制链接

扫一扫

专栏目录