DataFrame
没有相关API,df可以转为RDD然后再用,另外 也是只有pairRDD
才能用
RDD
def save3(): Unit = {
val spark: SparkSession = SparkSession.builder().master("local[1]").appName("test").getOrCreate()
//只有pairRDD才能存储为SequenceFile
//一般你随便读个json 很多字段是没法保存成SequenceFile的
val rdd: RDD[(String, Int)] = spark.sparkContext.parallelize(List(("a", 1), ("b", 2)))
rdd.saveAsSequenceFile("data/dir1")
}
读取
def read3(): Unit = {
val spark: SparkSession = SparkSession.builder().master("local[1]").appName("test").getOrCreate()
//读取时必须指定SequenceFile kv的泛型,不然会报错的
val rdd: RDD[(String, Int)] = spark.sparkContext.sequenceFile[String, Int]("data/dir1")
println(rdd.collect().mkString(","))//(a,1),(b,2)
}
总结
SequenceFile
只能用于pairRDD
,局限性很大
参考
How to save Spark Data Frames to Sequence File - Big Data / Apache Spark - itversity