spark sql操作复杂嵌套数组 json数据

spark sql 处理 对象嵌套数组json

元数据;

{"username":"king","actionInfo":{"id":1,"age":"22","partList":[{"code":"123","uname":"king"},{"code":"0012","uname":"king"}]}}

//方法一:用sql风格处理嵌套json 
val session = SparkSession.builder() .master("local[*]").appName(this.getClass.toString).getOrCreate() 
val frame = session.read.format("json").load("E:\\a.json") 
frame.printSchema() frame.createTempView("test") 
session.sql("select * from test").show(false) 
//to_json将字符串json数据转成json对象,json_tuple解析json数据 
session.sql( """ 
|select username,ai.id,ai.age,p.uname,p.code from test 
|lateral view json_tuple(to_json(actionInfo),'id','age','partList') ai as id,age,partlist 
|lateral view explode(split(regexp_replace(regexp_extract(partlist,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|')) partlist as p 
|lateral view json_tuple(p,'code','uname') p as code,uname 
""".stripMargin).show(false)

//方法二:DSL风格处理嵌套json 
//导入spark sql方法 用DSL风格处理嵌套json 
import org.apache.spark.sql.functions._ 
val frame1 = frame.select(col("username"), 
get_json_object(to_json(col("actionInfo")), "$.id") as "id", 
get_json_object(to_json(col("actionInfo")), "$.age") as "age", 
explode(split(regexp_replace(regexp_extract(get_json_object(to_json(col("actionInfo")), "$.partList"), "^\\[(.+)\\]$", 1), "\\}\\,\\{", "\\}\\|\\|\\{"), "\\|\\|")) as "partlist") 

frame1.select(col("username"), col("id"), col("age"),   
get_json_object(col("partlist"),"$.code") as "code", 
get_json_object(col("partlist"),"$.uname") as "uname").show(false)

 

//方法三:用dataFrame处理嵌套数组json
import session.implicits._
frame1.map(row => {
//获取表信息
val names = row.schema.fieldNames
val username = row.getAs[String](names.apply(1))
val row1 = row.getStruct(0)
val age = row1.getAs[String]("age")
val id = row1.getAs[Long]("id")
val names1 = row1.schema.fieldNames
val partlist = row1.getAs[Seq[Row]](names1.apply(2)).map(t => {
(t.getString(0), t.getString(1))
}).toList
(username, id, age, partlist.toString())
}).foreach(println(_))

结果展示:

schedule 信息:
root
 |-- actionInfo: struct (nullable = true)
 |    |-- age: string (nullable = true)
 |    |-- id: long (nullable = true)
 |    |-- partList: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- code: string (nullable = true)
 |    |    |    |-- uname: string (nullable = true)
 |-- username: string (nullable = true)
方法一 二 数据结果展示:
+--------+---+---+-----+----+
|username|id |age|uname|code|
+--------+---+---+-----+----+
|king    |1  |22 |king |123 |
|king    |1  |22 |king |0012|
+--------+---+---+-----+----+

方法三结果展示:

(king,1,22,List((123,king), (0012,king)))

 

 

 

参考:参考博客

 

  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值