1.有header的txt文件创建DataFrame:
利用 mapPartitionsWithIndex
val teacherRdd = sc.textFile("src/test/teacher.txt")
val teacherRddSchema = teacherRdd.mapPartitionsWithIndex((idx, iter) => if (idx == 0) iter.drop(1) else iter).map(row => row.split(" ")).map(field => teacher(field(0).toInt,field(1),field(2)))
val teacherDF = teacherRddSchema.toDF()
2.利用csv创建dataFrame的时候,给定case class去定义schema
import org.apache.spark.sql.Encoders
case class student (id:Int, name:String, course:String,score:Int)
val schema = Encoders.product[student].schema
val studentDf = spark.read.format("CSV").option("header",true).schema(schema).load("src/test/student.csv").as[student]
studentDf.printSchema()
完整程序:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache