jq是一个操作json的非常好的工具,这里记录一下使用jq去重踩到的一个坑
例一
- json文件内容如下
admin@pc-1:~$ cat raw_0.json
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
}
{
"cid": 200,
"info": {
"desc": "this is 200",
"color": "red"
}
}
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
}
admin@pc-1:~$
- 注意:
- 开头结尾没有用"[“和”]"包裹起来
- item之间没有用逗号隔开
- 使用jq根据"cid"去重正常
admin@pc-1:~$ jq -s 'unique_by(.cid)' raw_0.json
[
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
},
{
"cid": 200,
"info": {
"desc": "this is 200",
"color": "red"
}
}
]
admin@pc-1:~$
例二
- json文件内容如下,这个应该是json的标准格式,有"[]",有逗号分隔
admin@pc-1:~$ cat raw_1.json
[
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
},
{
"cid": 200,
"info": {
"desc": "this is 200",
"color": "red"
}
},
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
}
]
admin@pc-1:~$
- 还用原来的命令会报错
admin@pc-1:~$ jq -s 'unique_by(.cid)' raw_1.json
jq: error (at raw_1.json:23): Cannot index array with string "cid"
admin@pc-1:~$
- 需要去掉"-s",就可以正常去重了
admin@pc-1:~$ jq 'unique_by(.cid)' raw_1.json
[
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
},
{
"cid": 200,
"info": {
"desc": "this is 200",
"color": "red"
}
}
]
admin@pc-1:~$
- 看一下命令行帮助
admin@pc-1:~$ jq -h | grep "\-s "
-s read (slurp) all inputs into an array; apply filter to it;
admin@pc-1:~$
- 有趣的是可以从raw_1.json转成raw_0.json的格式
admin@pc-1:~$ cat raw_1.json | jq '.[]'
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
}
{
"cid": 200,
"info": {
"desc": "this is 200",
"color": "red"
}
}
{
"cid": 100,
"info": {
"desc": "this is 100",
"color": "green"
}
}
admin@pc-1:~$