scrapy 选择html,选择器（selector）

最新推荐文章于 2022-09-09 09:33:32 发布

weixin_39830917

最新推荐文章于 2022-09-09 09:33:32 发布

阅读量124

点赞数

文章标签： scrapy 选择html

Scrapy选择器构建于 lxml 库之上：

- xpath

- css

- 正则

构造选择器

---------

- Selector(text=xxx)

- Selector(response=xxx)

以 `text` 或 `response` 构造。

~~~

$ scrapy shell

>>> from scrapy.selector import Selector

>>> from scrapy.http import HtmlResponse

~~~

以文字构造：

~~~

>>> Selector(text='hello world')

>>> body = '

good'

>>> Selector(text=body).xpath('//span/text()').extract()

['good']

~~~

以response构造：

~~~

>>> response = HtmlResponse(url='http://example.com', body=body)

>>> Selector(response=response).xpath('//span/text()').extract()

['good']

~~~

简化

----

- `response.selector`

- `response.selector.xpath()` | `response.selector.css()`

- `response.xpath()` | `response.css()`

为方便使用，response 对象以 `.selector` 属性提供了一个 selector。

`response.selector.xpath` 快捷方式：`response.xpath`

`response.selector.css` 快捷方式：`response.css`

正则

--------

选择器还提供了正则作为补充。

如：xpath 中的 `contains()` 、`starts-with()` 函数无法满足需求是，可以使用正则 `test()` 方法

~~~

sel.xpath('//li[re:test(@class, "item-\d$")]//@href').extract()

~~~

weixin_39830917

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。