RDD 与DataFrame转换
RDD 通过toDF函数转换 DataFrame
val rddData1 = spark.sparkContext.parallelize(Array(("Alice", "18", "Female"), ("Mathew", "20", "Male")))
val df1 = rddData1.toDF("name", "age", "sex")
df1.show
DataFrame 通过调用rdd方法转换为RDD
df1.rdd.collect
RDD 与dataSet 转换
RDD 通过toDS函数转换 DataFrame
import org.apache.spark.sql.SparkSession
object TestSQL2 {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[*]")
.appName("test")
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
val rddData2 = spark.sparkContext.parallelize(Array(("Alice", "18", "Female"), ("Mathew", "20", "Male")))
val rddData3 = rddData2.map(t => User(t._1, t._2.toInt, t._3))
val ds1 = rddData3.toDS()
ds1.show
spark.stop()
}
}
case class User(name: String, age: Int, sex: String)
dataSet 通过调用rdd方法转换为RDD
ds1.rdd.count()
DataFrame 与 DataSet转换
val df2 = spark.createDataFrame(List(
("Alice", "Female", "20"),
("Tom", "Male", "25"),
("Boris", "Male", "18"))).toDF("name", "sex", "age")
val ds2 = df2.as[Person]
ds2.show
case class Person(name: String, age: String, sex: String)
Dataset 通过toDF DataFrame
ds2.toDF().show
由于DataSet数据强数据类型,DataFrame中数据转换DataSet时,对应column中要求个数,类型强一致