爬虫_jsonpath基本应用

json文件数据:

{
  "store": {
    "book": [
      {
        "category": "修真",
        "author": "六道",
        "title": "坏蛋是怎样练成的",
        "price": 8.95
      },
      {
        "category": "玄幻",
        "author": "天蚕土豆",
        "title": "斗破苍穹",
        "price": 12.99
      },
      {
        "category": "科幻",
        "author": "刘慈欣",
        "title": "三体",
        "price": 29.99
      },
      {
        "category": "历史",
        "author": "当年明月",
        "title": "明朝那些事儿",
        "price": 39.95
      },
      {
        "category": "文学",
        "author": "村上春树",
        "title": "挪威的森林",
        "price": 19.99
      },
      {
        "category": "悬疑",
        "author": "东野圭吾",
        "title": "白夜行",
        "price": 25.99,
         "isbn": 11002
      },
      {
        "category": "经济",
        "author": "吴晓波",
        "title": "大败局",
        "price": 36.99,
        "isbn": 11001
      }
    ]
  }
}

代码:

import json
import jsonpath

obj = json.load(open('爬虫_018_jsonpath.json', 'r', encoding='utf-8'))

#查看所有作者
#author_list = jsonpath.jsonpath(obj, '$.store.book[*].author')
#author_list = jsonpath.jsonpath(obj, '$..author')
#print(author_list)

#store下面的所有元素
# tag_list = jsonpath.jsonpath(obj, '$..store.*')
# print(tag_list)

#store下面的price
# price_list = jsonpath.jsonpath(obj, '$..book[*].price')
# print(price_list)


#第三本书
# num_list = jsonpath.jsonpath(obj, '$..book[2]')
# print(num_list)

#最后一本书
# book = jsonpath.jsonpath(obj, '$..book[(@.length-1])')
# print(book)

#前两本书
#book = jsonpath.jsonpath(obj, '$..book[:2]')
# book = jsonpath.jsonpath(obj, '$..book[0,1]')
# print(book)

#包含isbn的书
# isbn_list= jsonpath.jsonpath(obj, '$..book.[?(@.isbn)]')
# print(isbn_list)

#价格超过20的
price_list = jsonpath.jsonpath(obj, '$..book.[?(@.price>20)]')
print(price_list)

jsonpath语法

JSONPATH描述
$根对象,例如$.name
. 或者 [ ]子节点,例如$.name
..子孙节点访问,例如$..name

@

当前对象自身

*通配符,所有,例如$.leader.*
['key0','key1']多个节点访问。例如$['id','name']
[num]数组索引访问,可以是负数。例如$[0].leader.departments[-1].name
[num0,num1,num2...]数组多个元素访问,可以是负数,返回数组中的多个元素。例如$[0,3,-2,5]
[start:end :step]数组范围访问,可以是负数;step是步长,返回数组中的多个元素。例如$[0:5:2]
[?(key)]对象属性非空过滤,例如$.departs[?(name)],存在指定属性的departs
[key > 123]数值类型对象属性比较过滤,例如$[id >= 123],支持=,!=,>,>=,<,<=
[key = '123']字符串类型对象属性比较过滤,例如$[name = '123'],支持=,!=,>,>=,<,<=
[key like 'aa%']字符串类型like过滤,例如$[name like 'sz*'],通配符只支持%,支持not like
[key rlike 'regexpr']字符串类型正则匹配过滤,指定正则字符串
例如departs[name like 'aa(.)*'],
正则语法为jdk的正则语法,支持not rlike
[key in ('v0', 'v1')]IN过滤, 支持字符串和数值类型
例如:
$.departs[name in ('wenshao','Yako')]
$.departs[id not in (101,102)]
[key between 234 and 456]BETWEEN过滤, 支持数值类型,删选数值范围,支持not between
例如:
$.departs[id between 101 and 201]
$.departs[id not between 101 and 201]
length() 或者 size()数组长度。例如$.values.size()
支持类型java.util.Map和java.util.Collection和数组
keySet()获取Map的keySet或者对象的非空属性名称。例如$.val.keySet()
支持类型:Map和普通对象
不支持:Collection和数组(返回null)

jsonpath和xpath对比

XPathJSONPathResult
/store/book/author$.store.book[*].authorthe authors of all books in the store
//author$..authorall authors
/store/*$.store.*all things in store, which are some books and a red bicycle.
/store//price$.store..pricethe price of everything in the store.
//book[3]$..book[2]the third book
//book[last()]$..book[(@.length-1)]
$..book[-1:]
the last book in order.
//book[position()<3]$..book[0,1]
$..book[:2]
the first two books
//book[isbn]$..book[?(@.isbn)]filter all books with isbn number
//book[price<10]$..book[?(@.price<10)]filter all books cheapier than 10
//*$..*all Elements in XML document. All members of JSON structure.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值