在 persist 需要在第一次action之前，否则是不运行的

最新推荐文章于 2024-04-24 11:46:11 发布

weixin_43779531

最新推荐文章于 2024-04-24 11:46:11 发布

阅读量235

点赞数

本文链接：https://blog.csdn.net/weixin_43779531/article/details/85226550

版权

To illustrate RDD basics, consider the simple program below:

val lines = sc.textFile(“data.txt”)
val lineLengths = lines.map(s => s.length)
val totalLength = lineLengths.reduce((a, b) => a + b)
The first line defines a base RDD from an external file. This dataset is not loaded in memory or otherwise acted on: lines is merely a pointer to the file. The second line defines lineLengths as the result of a map transformation. Again, lineLengths is not immediately computed, due to laziness. Finally, we run reduce, which is an action. At this point Spark breaks the computation into tasks to run on separate machines, and each machine runs both its part of the map and a local reduction, returning only its answer to the driver program.

If we also wanted to use lineLengths again later, we could add:

lineLengths.persist()

注意：读清内容

before the reduce,

which would cause lineLengths to be saved in memory after the first time it is computed.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_43779531

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
在 persist 需要在第一次action之前，否则是不运行的

To illustrate RDD basics, consider the simple program below:val lines = sc.textFile(“data.txt”)val lineLengths = lines.map(s =&gt; s.length)val totalLength = lineLengths.reduce((a, b) =&gt; a + b)...
复制链接

扫一扫