Spark-核心数据集RDD(六)

本文深入探讨Spark核心数据集RDD,解释其作为只读、分区记录集合的特性,具备容错机制和转换操作。重点介绍了RDD的五个关键属性:分区列表、分区计算函数、依赖关系、分区器和优先计算位置,以及RDD的创建和操作,包括内存加载和外部数据源加载。
摘要由CSDN通过智能技术生成

SparkCore-核心数据集RDD

  今天真是美好的一天啊,那我们开始吧,我们今天讲一下RDD,为什么要将RDD了,先说一下我,作为一枚标准的理工男,如果没有彻底弄明白一个东西,就去实操,那肯定是一脸懵逼的,即使瞎一道题目猫碰上死耗子,暂时有了正确结果,但是题目文件类型一变,那又将是懵逼树上懵逼果,懵逼树下你和我。还记得高中化学,1mol水分子=2mol氢原子+1mol氧原子,没有弄明白mol的我,为什么2+1=1???

在这里插入图片描述
  好吧 就这样。因为初步我们的数据集都将是一个一个的RDD(这里我的理解是RDD是一个模板,会随数据复刻出许许多多的RDD),但是不明白RDD,我们很难讲一个RDD转化为另一个RDD。在SparkCore中,数据处理就是RDD之间的互相转换。
在Spark源码中对于RDD的描述是这样的:

A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. This class contains the basic operations available on all RDDs, such as map, filter, and persist. In addition, PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; DoubleRDDFunctions contains operations available only on RDDs of Doubles; and SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. All operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)]) through implicit.
Internally, each RDD is characterized by five main properties:
A list of partitions
A function for computing each split
A list of dependencies on other RDDs
Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)
All of the scheduling and execution in Spark is done based on these methods, allowing each RDD to implement its own way of computing itself. Indeed, users can implement custom RDDs (e.g. for reading data from a new storage system) by overriding these functions. Please refer to 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值