spark的编程接口包括
1.分区信息,数据集的最小分片
(1)Patitions()用法:
scala> val part=sc.textFile("/user/README.md",6)
part: org.apache.spark.rdd.RDD[String] = /user/README.md MapPartitionsRDD[9] at textFile at <console>:24
scala> part.partitions.size
res3: Int = 6 #可以用来打印分区个数
2.依赖关系,指向其父RDD
(1)Dependencies()使用方法:
scala> val part=sc.textFile("/user/README.md")
scala> val wordmap=part.flatMap(_.split(" ")).map(x=>(x,1))
scala> wordmap.dependencies.foreach{dep=> println(dep.getClass)} #调取getClass方法可以获取依赖方式
class org.apache.spark.OneToOneDependency
scala> wordredue.dependencies.foreach{dep=> println(dep.getClass)}
class org.apache.spa