提出问题
在Athena
数据库中遇到如下json
格式的字段:
{"name": "Susan Smith",
"org": "engineering",
"projects":
[
{"name":"project1", "completed":false},
{"name":"project2", "completed":true}
]
}
希望解析该字段,提取关键信息
初步解析
首先任务是提取name
的值和projects
的值。我们来试一下:
WITH dataset AS (
SELECT '{"name": "Susan Smith",
"org": "engineering",
"projects": [{"name":"project1", "completed":false},
{"name":"project2", "completed":true}]}'
AS blob
)
SELECT
json_extract(blob, '$.name') AS name,
json_extract(blob, '$.projects') AS projects
FROM dataset
结果如下:
name projects
"Susan Smith" [{"name":"project1","completed":false},{"name":"project2","completed":true}]
其中name和projects均为json格式。如果想要得到varchar格式的结果,只需要将json_extract
替换为json_extract_scalar
即可。但需要注意json_extract_scalar
不能作用于array
, maps
,structs
格式。
深度解析
下面展示获取projects
下的第一个name
的值:
WITH dataset AS (
SELECT '{"name": "Susan Smith",
"org": "engineering",
"projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}'
AS blob
)
SELECT
json_extract_scalar(blob, '$.name') AS name,
json_extract(blob, '$.projects[0].name') AS projects
FROM dataset
需注意数组中的元素计数从 0 开始。
本文是在阅读athena
官方文档时的读书笔记,原文见:Extracting Data from JSON