Spark SQL Working with JSON

1 篇文章 0 订阅

Working with JSON

  • JSON data is most easily read-in as line delimied   JSON数据最容易以行分隔符的格式进行读取
  • Schema is inferred upon load     Schema信息能自动推导
  • If you want to flatten your JSON data,use the explode method 如果您想要使JSON数据变平,使用explode 方法
  • Access nested-objects with dot syntax   可以使用点的语法访问内嵌的对象
scala> val json = spark.read.format("json").load("file:///home/hadoop/data/test.json")
json: org.apache.spark.sql.DataFrame = [name: string, nums: array<bigint>]

scala> json.show
+--------+----------------+
|    name|            nums|
+--------+----------------+
|zhangsan| [1, 2, 3, 4, 5]|
|    lisi|[6, 7, 8, 9, 10]|
+--------+----------------+
scala> json.createOrReplaceTempView("json_table")


scala>  spark.sql("select * from json_table").show
+--------+----------------+
|    name|       nums|
+--------+----------------+
|zhangsan| [1, 2, 3, 4, 5]|
|    lisi|[6, 7, 8, 9, 10]|
+--------+----------------+
scala>  spark.sql("select name,nums[1] from json_table").show
+--------+----------+
|  name | nums[1]  |
+--------+-------+
|zhangsan|      2|
|   lisi|      7|
+--------+-------+
#If you want to flatten your JSON data,use the explode method
scala>spark.sql("select name,explode(nums) from json_table").show
+--------+---+
|    name|col|
+--------+---+
|zhangsan|  1|
|zhangsan|  2|
|zhangsan|  3|
|zhangsan|  4|
|zhangsan|  5|
|   lisi|  6|
|   lisi|  7|
|   lisi|  8|
|   lisi|  9|
|   lisi|  10|
+--------+---+
test2.json数据
cat test2.json 
{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
{"name":"Michael", "address":{"city":null, "state":"California"}}
scala> val json = spark.read.format("json").load("file:///home/hadoop/data/test2.json")
json: org.apache.spark.sql.DataFrame = [address: struct<city: string, state: string>, name: string]


scala> json.show
+-----------------+-------+
|      address |  name|
+-----------------+-------+
|  [Columbus,Ohio]|   Yin|
|[null,California]|Michael|
+-----------------+-------+


scala> json.createOrReplaceTempView("json_table")


scala>spark.sql("select * from json_table").show
+-----------------+-------+
|      address |  name|
+-----------------+-------+
|  [Columbus,Ohio]|   Yin|
|[null,California]|Michael|
+-----------------+-------+
#Access nested-objects with dot syntax 

scala>spark.sql("select name,address.city,address.state from json_table").show
+-------+--------+----------+
|  name|   city|   state|
+-------+--------+----------+
|   Yin|Columbus|    Ohio|
|Michael|   null|California|
+-------+--------+----------+







评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值