主要步骤:
- 创建RDD
- 设置sc.setCheckpointDir路径
- 调用checkpoint方法
- 调用一下action操作
scala> sc.setCheckpointDir("hdfs://hadoop129:9000/spark_check_point_20191014_data")
scala> val data = sc.parallelize(1 to 10, 4)
data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24
scala> data.checkpoint
scala> data.count
res3: Long = 10
scala> data.isCheckpointed
res4: Boolean = true