RDD的五大特性:
1.partitions_:partition数组
2.dependencies_:Dependency序列
3. compute:计算函数
4. Partitioner:分区器
5. Preferred Locations:存储存取每个Partition的优先位置
一、dependency
在RDD中dependencies_是专门用来存储当前RDD的父dependency序列。
dependencies方法,用于获取当前RDD的所有依赖的序列,源码如下:
package org.apache.spark.rdd
final def dependencies: Seq[Dependency[_]] = {
checkpointRDD.map(r => List(new OneToOneDependency(r))).getOrElse {
if (dependencies_ == null) {
dependencies_ = getDependencies
}
dependencies_
}
}
1.先从CheckPoint中获取RDD,并将这些