- 博客(6)
- 资源 (28)
- 收藏
- 关注
原创 Spark Programming Guide(五)
RDD PersistenceOne of the most important capabilities in Spark is persisting (or caching) a dataset in memory across operations. When you persist an RDD, each node stores any partitions of it that it c
2017-09-26 09:39:02 503
原创 Spark Programming Guide(四)
Shuffle operationsCertain operations within Spark trigger an event known as the shuffle. The shuffle is Spark’s mechanism for re-distributing data so that it’s grouped differently across partitions. Th
2017-09-23 12:03:46 551
原创 Spark Programming Guide(三)
Working with Key-Value PairsWhile most Spark operations work on RDDs containing any type of objects, a few special operations are only available on RDDs of key-value pairs. The most common ones are dis
2017-09-22 10:36:56 566
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人