先定义一个case clas
case class people(id:Int, name:String, age:Int)
读取txt,转化为RDD
val rddpeople = sc.textFile("source path")
利用case class给RDD一个schema
val peopleSchema = rddpeople.map(row => row.split(" ")).map(field => people(field(0).toInt,field(1),field(2).toInt))
将RDD转为Dataframe
val peopleDf = peopleSchema.toDF()
完整代码:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.log4j._
import org.apache.spark.sql.SparkSession
case class people (id:Int,name:String,age:Int)
object test {
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel({Level.ERROR})
val conf = new SparkConf().setAppName("Interview").setMaster("local")
val sc = new SparkContext