1. 安装使用
第三方模块需要自行下载
pip install jsonpath
2. 语法
-
对于网页中抓取的json文件
import json
-
进行转换
json_dict = json.loads(json文件)
-
引用模块中方法
from jsonpath import jsonpath
-
使用方法:
json(json_dict, 'jsonpath语法规则字符串')
-
语法字符串规则
符号 | 描述 |
---|---|
$ | 根节点 |
. | 子节点 |
… | 任意节点 |
* | 所有元素节点 |
符号 | 描述 |
---|---|
$…price | 所有价格列表 |
$.store.* | 商店中所有内容 |
$.store…price | 商店下的所有价格 |
$…book[2] | 第二本书 |
$…book[:-1] | 最后一本书 |
$…* | 获取所有内容 |
- 代码示例
from jsonpath import jsonpath
import json
book_dict = '''{
"store": {
"book": [
{ "category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{ "category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{ "category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{ "category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}'''
json_dict = json.loads(book_dict)
print(jsonpath(json_dict,'$..price')) # 所有价格列表
print(jsonpath(json_dict,'$.store.*')) # 商店中所有内容
print(jsonpath(json_dict,'$.store..price')) # 商店下的所有价格
print(jsonpath(json_dict,'$..book[2]')) # 第二本书
print(jsonpath(json_dict,'$..book[:-1]')) # 最后一本书
print(jsonpath(json_dict,'$..*')) # 获取所有内容
- 练习:对json网址:
http://www.lagou.com/lbs/getAllCitySearchLabels.json
采集所有城市名称
import requests
import json
from jsonpath import jsonpath
url = 'http://www.lagou.com/lbs/getAllCitySearchLabels.json'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
response = requests.get(url, headers=headers)
json_dict = json.loads(response.content)
print(jsonpath(json_dict, '$..name'))