我这里创建的方式就是两种
一,通过RDD和case class的关联来进行创建
1.创建SparkConf和SparkContext
val conf=new SparkConf()
.setMaster("local")
.setAppName("old_sparkSql")
val sc=new SparkContext(conf)
2.创建RDD
val lines: RDD[String] = sc.textFile("C:\\Demo_data\\people.txt")
3.在object外部创建case class
case class people(id:Long,name:String,age:Int,fv:Int)
4.对RDD的数据进行处理并与创建的case class进行关联
val personRDD: RDD[people] = lines.map(line => {
val fields: Array[String] = line.split(",")
val id = fields(0).toLong
val name = fields(1)
val age = fields(2).toInt
val fv = fields(3).toInt
people(id, name, age, fv)
})
5.创建SQLContext
val sqlContext=new SQLContext(sc)
6.导入隐士转换并进行DataFrame的创建(导入隐式转换的目的就是当前类没有所需要的但是它存在与别的类里面,可以导入隐式转换)
import sqlContext.implicits._
val personDF: DataFrame = personRDD.toDF()
7.将创间的datafram进行临时表的创建
personDF.registerTempTable("T_person")
8.利用sqlContext进行对临时表的查询
val result: DataFrame = sqlContext.sql("SELECT name,age,fv FROM T_person ORDER BY fv desc")
9.进行actio操作,并且关闭资源:
result.write.json("C:\\people.json")
sc.stop()
二,通过创建schame信息来创建
1.创建SparkConf和SparkContext
val conf=new SparkConf()
.setMaster("local")
.setAppName("old_sparkSql_schame")
val sc=new SparkContext(conf)
2.创建RDD
val lines: RDD[String] = sc.textFile("C:\\Demo_data\\people.txt")
3.对RDD的数据进行处理并且设置RDD的类型为 RDD[Row]
val peosonRow: RDD[Row] = lines.map(line => {
val Array12: Array[String] = line.split(",")
val id = Array12(0).toLong
val name = Array12(1)
val age = Array12(2).toInt
val fv = Array12(3).toInt
Row(id, name, age, fv)
})
4.创建SQLContext
val sqlContext=new SQLContext(sc)
5.创建schame元数据信息类型格式
val schame=StructType(
List(
StructField("id",LongType,true) ,
StructField("name",StringType,true),
StructField("age",IntegerType,true),
StructField("fv",IntegerType,true)
)
)
6.创建DataFream同时传入两个参数
val pdf: DataFrame = sqlContext.createDataFrame(peosonRow,schame)
7.创建临时表:
pdf.registerTempTable("t_people")
8.利用sqlContext进行对临时表的查询
val result: DataFrame = sqlContext.sql("SELECT name,age,fv FROM T_person ORDER BY fv desc")
9.进行actio操作,并且关闭资源:
result.write.json("C:\\people.json")
sc.stop()