使用idea,Spark读取json
普通读取
import org.apache.spark.sql.SparkSession
object r11_26_2 {
def main(args:Array[String]):Unit= {
val session =new SparkSession.Builder().appName("").master("local").getOrCreate()
val df = session.read.json("./out")
df.show()
}
}
读取嵌套的会不适用
读取嵌套
/**
* 格式:
* {"name":"zhangsan","score":100,"infos":{"age":20,"gender":'man'}}
*/
import org.apache.spark.sql.{SQLContext, SparkSession}
object r11_26_2 {
def main(args:Array[String]):Unit= {
val session =new SparkSession.Builder().appName("").master("local").getOrCreate()
val df = session.read.json("./out")
df.printSchema()
df.createOrReplaceTempView("dfs")
session.sql("select name,city_name,comment.cmt_room_type from dfs").show(100)
}
}
读取数组文件会失败,其他的可以读、
建立临时表->查询
读取嵌套
/**
* 读取嵌套的jsonArray数组,格式如下:
* {"name":"lisi","age":19,"scores":[{"yuwen":58,"shuxue":50,"yingyu":78},{"dili":56,"shengwu":76,"huaxue":13}]}
*
*explode函数作用:将数组展开,数组中的每个json都是一条数据
*/
import org.apache.spark.sql.{SQLContext, SparkSession}
object r11_26_2 {
def main(args:Array[String]):Unit= {
val session =new SparkSession.Builder().appName("").master("local[*]").getOrCreate()
val df = session.read.json("./jdn.json")
import org.apache.spark.sql.functions._
import session.implicits._
val transDF = df.select($"phone_name",$"comments.buy_color",explode($"parameter")).toDF("phone_name","buy_color","parameter")
val transDF1 = transDF.select("phone_name","buy_color","parameter.CPU品牌")
transDF1.show(100)
transDF.printSchema()
}
}
这个数组不能用于键相同的