spark源码阅读笔记RDD（二）RDD子类基本方法和信息

最新推荐文章于 2023-01-01 15:45:30 发布

legotime

最新推荐文章于 2023-01-01 15:45:30 发布

阅读量1k

点赞数 1

分类专栏： spark源码阅读笔记文章标签： spark RDD

本文链接：https://blog.csdn.net/legotime/article/details/51223572

版权

spark源码阅读笔记专栏收录该内容

15 篇文章 6 订阅

订阅专栏

RDD子类

// =======================================================================
// Methods that should be implemented by subclasses of RDD
// =======================================================================

/**
 * :: DeveloperApi ::
 * Implemented by subclasses to compute a given partition.
 */
@DeveloperApi
def compute(split: Partition, context: TaskContext): Iterator[T]


/**
  * compute：用来计算RDD的subclass的partition
  *  Scala中的Iterator，可用来依次取下一个数据
  *  它有一个特点，叫TraversableOnce，就像单行道，只能向前走，不能回头
  */


/**
 * Implemented by subclasses to return the set of partitions in this RDD. This method will only
 * be called once, so it is safe to implement a time-consuming computation in it.
 *得到子类的  partition和index的信息
 * The partitions in this array must satisfy the following property:
 *   `rdd.partitions.zipWithIndex.forall { case (partition, index) => partition.index == index }`
 */
protected def getPartitions: Array[Partition]

/**
 * Implemented by subclasses to return how this RDD depends on parent RDDs. This method will only
  * 得到父RDD的信息
 * be called once, so it is safe to implement a time-consuming computation in it.
 */
protected def getDependencies: Seq[Dependency[_]] = deps

/**
 * Optionally overridden by subclasses to specify placement preferences.由子类重写指定位置偏好。
 */
protected def getPreferredLocations(split: Partition): Seq[String] = Nil

/** Optionally overridden by subclasses to specify how they are partitioned. */
@transient val partitioner: Option[Partitioner] = None

// =======================================================================
// Methods and fields available on all RDDs
// =======================================================================

/** The SparkContext that created this RDD. 每个RDD都可以包含创建时候关于环境的信息*/
def sparkContext: SparkContext = sc

/** A unique ID for this RDD (within its SparkContext). */
val id: Int = sc.newRddId()

/** A friendly name for this RDD */
@transient var name: String = null

/** Assign a name to this RDD */
def setName(_name: String): this.type = {
  name = _name
  this
}