【爬虫】（Scrapy）初学 Scrapy 过程中的知识和问题整理

最新推荐文章于 2023-06-09 14:51:55 发布

FeatureOverload

最新推荐文章于 2023-06-09 14:51:55 发布

阅读量215

点赞数

分类专栏： # Scrapy

本文链接：https://blog.csdn.net/qq_29757283/article/details/101269812

版权

Scrapy 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Overview

XPath

在浏览器中使用 XPath

F12 -> Console:

> $x("<xpath>")

例子：

<div class="grade-box clearfix">
    <dl>...略...</dl>
    <dl>
        <dd title="60852">
                 6万+         </dd>
    </dl>
    <dl>...略...</dl>
</div>

取出元素的具体值（比如，取出“6万+” ）
> $x('//div[@class="grade-box clearfix"]//dl[2]//dd')[0]["innerHTML"]
```
"
            6万+            "
```

取出元素的具体文本（比如，取出“6万+” 但是不带空白符）

> $x('//div[@class="grade-box clearfix"]//dl[2]//dd')[0]["innerText"]
"6万+"

取出元素的 attribute

> $x('//div[@class="grade-box clearfix"]//dl[2]//dd/@title')[0]["textContent"]
"60852"
> $x('//div[@class="grade-box clearfix"]//dl[2]//dd/@title')[0]["value"]
"60852"

总结：

xpath 中间如果有多个匹配，使用 [1] 或 [2] 或 [3] … 这样的索引选择（从 1 开始！）。
获取 attribute，在元素基础 xpath 上增加 /@ 表示后面跟的是 attribute，写全即 //*//<xpath>/@<attribute>。
$x() 执行获得的有效结果总是 array。
对 $x() 执行得到的 array 结果，如果 xpath 精确的化，一般即是我们想要的在 [0] 位置，且只有这一个。
$x()[0] 中的结果相当于字典，通过 $x()[0]["key"] 取值。

在 scrapy 的 `response` 中使用 xpath - n/a

n/a

常见问题

AttributeError: ‘str’ object has no attribute ‘iter’

在使用 rules + Rule(LinkExtractor(...), ...) 时遇到这个问题。

原因一：

LinkExtractor 中的 restrict_xpaths 期望的是指到“元素”的 xpath，也就说不能在其里面有 ***/@<attr> 这样的写法。
如果想要取得的是 attribute，则定义（放到） LinkExtractor 内的 attrs 参数中去。

部署在 Scrapyd 上

$ pip install scrapyd
$ pip install scrapyd-client
$ pwd
<path>/<to>/<project>
$ vim ./scrapy.cfg
#### uncomment url
[deploy]
url = http://localhost:6800/
project = posts

问题

builtins.AttributeError: ‘int’ object has no attribute ‘splitlines’

参考：https://blog.csdn.net/qq_29719097/article/details/89431234

安装 scrapyd（scrapyd-client）支持的版本

Scrapy==1.6.0
Twisted==18.9.0

$ pip uninstall twisted
$ pip uninstall scrapy
$ pip install twisted==18.9.0
$ pip install scrapy==1.6.0

scrapydweb – 增强 Scrapyd

n/a

单元测试 - TODO

n/a

分布式 - TODO

N/A

Reference

FeatureOverload

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【爬虫】（Scrapy）初学 Scrapy 过程中的知识和问题整理

OverviewXPath在浏览器中使用 XPath在 scrapy 的 `response` 中使用 xpath - n/a参见问题AttributeError: 'str' object has no attribute 'iter'ReferenceXPath在浏览器中使用 XPathF12 -> Console:> $x("<xpath>")例子：&...
复制链接

扫一扫