scrapy的name变量_scrapy xpath 常用方法

最新推荐文章于 2022-07-11 07:38:00 发布

Fetch_ai

最新推荐文章于 2022-07-11 07:38:00 发布

阅读量123

点赞数

文章标签： scrapy的name变量

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35182625/article/details/113998917

版权

[TOC]

# 1、命令行启动scrapy

```

# scrapy shell "http://www.baidu.com/"

```

然后使用 respone 变量， response 变量里面是请求URL后的返回信息。

```

divs = response.xpath('//div') 全局搜索

p = divs.xpath('.//p') 在 divs 的基础上全局搜素

p2 = divs.xpath('p') 提取所有直接子p标签

```

# 2、全局搜索

```

response.xpath('//div[@class="row header-box"]/text()').extract_first()

```

`extract_first()` 如果有多个结果则获取第一个，可以避免使用 list 下标访问出错

# 3、获取文本内容的三个函数

```

.extract()

.extract_first()

.extract_first(default='not-found')

```

# 4、获取标签内的文本内容

使用 `text()` 函数、`extract_first()` 函数。

```

response.xpath('//title/text()').extract_first()

```

# 5、根据标签属性值获取内容

- 获取a标签下的 href 值

```

response.xpath('a/@href')[1].extract_first()

```

- 全局获取 `a` 标签中， `href` 属性包含字符串 `"image"` 的 `href` 属性值

```

response.xpath('//a[contains(@href, "image")]/@href').extract_first()

```

- 全局获取 `a` 标签中， `href` 属性包含字符串 `"image"`，并获取 `a` 标签下的 `img` 标签的 `src` 属性值

```

response.xpath('//a[contains(@href, "image")]/img/@src').extract_first()

```

# 6、内容匹配正则表达式

- 正则表达式匹配，结果已经分组 list

```

response.xpath('//a[contains(@href, "image")]/text()').re(r'Name:\s*(.*)')

```

- 正则表达式匹配，结果已经分组 list ，使用 `re_first()`

```

response.xpath('//a[contains(@href, "image")]/text()').re_first(r'Name:\s*(.*)')

```

# 7、xpath内部使用正则表达式

```

全局搜索 li 标签，它的 class 属性满足正则表达式 "item-\d$" 的 li 标签

response.xpath('//li[re:test(@class, "item-\d$")]//@href').extract()

```

# 8、变量使用

- $val的值由随后的 `val='images'` 传递进去

```

response.xpath('//div[@id=$val]/a/text()', val='images').extract_first()

```

- 搜索 div 标签中包含有5个 a 标签的 div 的 id

```

response.xpath('//div[count(a)=$cnt]/@id', cnt=5).extract_first()

```

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。