Scarpy源码分析 12 Requests and Responses

最新推荐文章于 2024-07-27 23:02:02 发布

及锋而试

最新推荐文章于 2024-07-27 23:02:02 发布

阅读量1.5k

点赞数

分类专栏： 2021SC@SDUSC 文章标签： python

本文链接：https://blog.csdn.net/No_oneelse/article/details/121744105

版权

2021SC@SDUSC 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

2021SC@SDUSC

我计划在实验的最后时间中，对Requests and Responses部分进行重点的实验探究。重点理解源码中对于Requests and Responses的操作源码。

在项目文档中，有这样的定义：

Scrapy uses Request and Response objects for crawling web sites.

Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.

Both Request and Response classes have subclasses which add functionality not required in the base classes. These are described below in Request subclasses and Response subclasses.

Scrapy 使用 Request 和 Response 对象来抓取网站。

通常，请求对象在爬虫程序中生成并穿过系统，直到它们到达下载器，下载器执行请求并返回一个响应对象，该对象返回到发出请求的爬虫程序。

Request 和 Response 类都有子类，这些子类添加了基类中不需要的功能。这些在下面的请求子类和响应子类中进行了描述。

1.1Request objects：

classscrapy.http.Request(*args, **kwargs)

A Request object represents an HTTP request, which is usually generated in the Spider and executed by the Downloader, and thus generating a Response.

Request 对象代表一个 HTTP 请求，通常在 Spider 中生成并由 Downloader 执行，从而生成一个 Response。

part1：参数：

url (str) –the URL of this request

If the URL is invalid, a ValueError exception is raised.

此请求的 URL

如果 URL 无效，则会引发 ValueError 异常。

callback (collections.abc.Callable) – the function that will be called with the response of this request (once it’s downloaded) as its first parameter. For more information see Passing additional data to callback functions below. If a Request doesn’t specify a callback, the spider’s parse() method will be used. Note that if exceptions are raised during processing, errback is called instead.

callback (collections.abc.Callable) – 将使用此请求的响应（一旦下载）作为第一个参数调用的函数。将附加数据传递给回调函数。如果请求未指定回调，则将使用爬虫的 parse() 方法。请注意，如果在处理期间引发异常，则会调用 errback。

method (str) – the HTTP method of this request. Defaults to 'GET'.

此请求的 HTTP 方法。默认为“获取”。

meta (dict) – the initial values for the Request.meta attribute. If given, the dict passed in this parameter will be shallow copied.
Request.meta 属性的初始值。如果给定，则在此参数中传递的 dict 将被浅复制。

body (bytes or str) – the request body. If a string is passed, then it’s encoded as bytes using the encoding passed (which defaults to utf-8). If body is not given, an empty bytes object is stored. Regardless of the type of this argument, the final value stored will be a bytes object (never a string or None).

请求正文。如果传递了字符串，则使用传递的编码（默认为 utf-8）将其编码为字节。如果没有给出 body，则存储一个空的 bytes 对象。无论此参数的类型如何，存储的最终值都将是字节对象（绝不是字符串或无）。

headers (dict) –the headers of this request. The dict values can be strings (for single valued headers) or lists (for multi-valued headers). If None is passed as value, the HTTP header will not be sent at all.

此请求的标头。 dict 值可以是字符串（对于单值标题）或列表（对于多值标题）。如果 None 作为值传递，则根本不会发送 HTTP 标头。

此外有一点需要注意：CookiesMiddleware 不考虑通过 Cookie 标头设置的 Cookie。如果我们需要为请求设置 cookie，需要使用 Request.cookies 参数。

cookies (dict or list) –

the request cookies. These can be sent in two forms.

请求 cookie。这些可以以两种形式发送。

Using a dict:

request_with_cookies = Request(url="http://www.example.com",
                               cookies={'currency': 'USD', 'country': 'UY'})

Using a list of dicts:

request_with_cookies = Request(url="http://www.example.com",
                               cookies=[{'name': 'currency',
                                        'value': 'USD',
                                        'domain': 'example.com',
                                        'path': '/currency'}])

后一种形式允许自定义 cookie 的域和路径属性。这仅在为以后的请求保存 cookie 时才有用。

当某些站点返回 cookie（在响应中）时，这些 cookie 存储在该域的 cookie 中，并将在以后的请求中再次发送。这是任何常规 Web 浏览器的典型行为。

要创建不发送存储的 cookie 和不存储接收的 cookie 的请求，需在 request.meta 中将 dont_merge_cookies 键设置为 True。

发送手动定义的 cookie 并忽略 cookie 存储的请求示例：

Request(
    url="http://www.example.com",
    cookies={'currency': 'USD', 'country': 'UY'},
    meta={'dont_merge_cookies': True},
)

及锋而试

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Scarpy源码分析 12 Requests and Responses

2021SC@SDUSC我计划在实验的最后时间中，对Requests and Responses部分进行重点的实验探究。重点理解源码中对于Requests and Responses的操作源码。在项目文档中，有这样的定义：Scrapy uses Request and Response objects for crawling web sites.Typically, Request objects are generated in the spiders and pass across t
复制链接

扫一扫