python 之 jsonpath、jmespath 与xpath 提取数据

风华浪浪

已于 2023-07-06 18:08:46 修改

阅读量561

点赞数

分类专栏： p爬虫文章标签： python

于 2020-03-23 13:54:59 首次发布

本文链接：https://blog.csdn.net/a6864657/article/details/82801951

版权

p爬虫专栏收录该内容

37 篇文章 5 订阅

订阅专栏

一、JMESPath

JMESPath，jsonpath 目的如何从JSON文档中提取元素

查询key对应的value

{
'ops': {
	'functionA': {'numArgs': 2},
	'functionB': {'numArgs': 3},
	'functionC': {'variadic': True}
	}
}

jmespath.search('b', source)                   bar

使用.操作符

{"a": {"b": {"c": {"d": "value"}, "e": "value"}}}

jmespath.search('a.b.c', source)

下标操作，仅用于数组

["a", "b", "c", "d", "e", "f"]

jmespath.search("[1]",source)

下标和.操作符混合操作

{'a': {'b': {'c': [{'d': [0, [1, 2]]}, {'d': [3, 4]}]}}}

jmespath.search('a.b.c[0].d[1][0]',source)

切片

 ["a", "b", "c", "d", "e", "f"]

jmespath.search("[1:3]",source)

投影其实就是初始时定义好格式，然后按照格式的方式进行取值。
List列表、Slice切片、字典对象、Flatten 正则、Filter 过滤条件
注意：取列表用[]，取字典用.

{
	'people': [
		{'first': 'James', 'last': 'd'}, 
		{'first': 'Jacob', 'last': 'e'}, 
		{'first': 'Jayden', 'last': 'f'}, 
		{'missing': 'different'}
	], 
	'foo': {'bar': 'baz'}
}

jmespath.search('people[*].first', source5)   列表和切片

{
	'ops': {
		'functionA': {'numArgs': 2}, 
		'functionB': {'numArgs': 3}, 
		'functionC': {'variadic': True}
	}
}

jmespath.search('ops.*.numArgs', source) 支持 列表字典 语法为the*syntax.

{
	'people': [
		{
			'general': {'id': 100, 'age': 20, 'other': 'foo', 'name': 'Bob'}, 
			'history': {'first_login': '2014-01-01', 'last_login': '2014-01-02'}
		}, 
		{
			'general': {'id': 101, 'age': 30, 'other': 'bar', 'name': 'Bill'}, 
			'history': {'first_login': '2014-05-01', 'last_login': '2014-05-02'}
		}
	]
}

jmespath.search("people[?general.age > `20`].general | [0]", source7)
带过滤条件 ==, !=, <, <=, >, >=：
大于20的年龄键的常规键的值

二、JMESPath 简单应用

任何年龄键值大于 20 的元素 支持过滤器表达式 ( [? < expr > ] )

{
	'people': [
		{'age': 20, 'other': 'foo', 'name': 'Bob'}, 
		{'age': 25, 'other': 'bar', 'name': 'Fred'}, 
		{'age': 30, 'other': 'baz', 'name': 'George'}
	]
}

people[?age > "20"].[name, age]
people[?age > '20'].{age: age, tags: tags[0], name: name}

处理嵌套数据 支持操作符、多选列表、过滤器和管道

{
    "reservations": [
        {
            "instances": [
                {
                    "type": "small",
                    "state": {"name": "running"},
                    "tags": [
                        {"Key": "Name","Values": ["Web"]},
                        {"Key": "version","Values": ["1"]
                        }
                    ]
                },
                {
                    "type": "large",
                    "state": {"name": "stopped"},
                    "tags": [
                        {"Key": "Name","Values": ["Web"]},
                        {"Key": "version","Values": ["1"]}
                    ]
                }
            ]
        },
        {
            "instances": [
                {
                    "type": "medium",
                    "state": {"name": "terminated"},
                    "tags": [
                        {"Key": "Name","Values": ["Web"]},
                        {"Key": "version","Values": ["1"]}
                    ]
                },
                {
                    "type": "xlarge",
                    "state": {"name": "running"},
                    "tags": [
                        {"Key": "Name","Values": ["DB"]},
                        {"Key": "version","Values": ["1"]
                        }
                    ]
                }
            ]
        }
    ]
}

基础获取数据  reservations[].instances[].[tags[?Key=='Name'].Values[] | [0], type, state.name]
过滤嵌套数据  people[?general.id==`100`].general | [0]

sort_by指定字符排序获取最大值max_by、最小值min_by

{
	'Contents': [
		{'Date': '2014-12-21T05:18:08.000Z', 'Key': 'logs/bb', 'Size': 303}, 
		{'Date': '2014-12-20T05:19:10.000Z', 'Key': 'logs/aa', 'Size': 308}, 
		{'Date': '2014-12-20T05:19:12.000Z', 'Key': 'logs/qux', 'Size': 297}, 
		{'Date': '2014-11-20T05:22:23.000Z', 'Key': 'logs/baz', 'Size': 329}, 
		{'Date': '2014-12-20T05:25:24.000Z', 'Key': 'logs/bar', 'Size': 604}, 
		{'Date': '2014-12-20T05:27:12.000Z', 'Key': 'logs/foo', 'Size': 647}
		]
}

按照日期排序  sort_by(Contents, &Date)[*].{Key: Key, Size: Size}

稍微复杂应用

{
	'locations': [
		{'name': 'Seattle', 'state': 'WA'}, 
		{'name': 'New York', 'state': 'NY'}, 
		{'name': 'Bellevue', 'state': 'WA'}, 
		{'name': 'Olympia', 'state': 'WA'}
	]
}

筛选、拼接 locations[?state == 'WA'].name | sort(@)[-2:] | {WashingtonCities: join(', ', @)}

三、jsonpath 与xpath 区别

以下是一些基本示例，可让您对它的功能有所了解：
在这里插入图片描述

风华浪浪

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录