Scarpy源码分析 16 Requests and Responses Ⅴ

最新推荐文章于 2024-07-26 21:19:18 发布

及锋而试

最新推荐文章于 2024-07-26 21:19:18 发布

阅读量427

点赞数

分类专栏： 2021SC@SDUSC 文章标签： python

本文链接：https://blog.csdn.net/No_oneelse/article/details/121744579

版权

2021SC@SDUSC 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

2021SC@SDUSC

类方法classmethod from_response(response[, formname=None, formid=None, formnumber=0, formdata=None, formxpath=None, formcss=None, clickdata=None, dont_click=False, ...])

的主要参数：

Parameters

response (Response object) – the response containing a HTML form which will be used to pre-populate the form fields

formname (str) – if given, the form with name attribute set to this value will be used.

formid (str) – if given, the form with id attribute set to this value will be used.

formxpath (str) – if given, the first form that matches the xpath will be used.

formcss (str) – if given, the first form that matches the css selector will be used.

formnumber (int) – the number of form to use, when the response contains multiple forms. The first one (and also the default) is 0.

formdata (dict) – fields to override in the form data. If a field was already present in the response <form> element, its value is overridden by the one passed in this parameter. If a value passed in this parameter is None, the field will not be included in the request, even if it was present in the response <form> element.

clickdata (dict) – attributes to lookup the control clicked. If it’s not given, the form data will be submitted simulating a click on the first clickable element. In addition to html attributes, the control can be identified by its zero-based index relative to other submittable inputs inside the form, via the nr attribute.

dont_click (bool) – If True, the form data will be submitted without clicking in any element.

formname (str) – 如果给定，将使用 name 属性设置为该值的表单。

formid (str) - 如果给定，将使用 id 属性设置为该值的表单。

formxpath (str) – 如果给定，将使用第一个与 xpath 匹配的形式。

formcss (str) – 如果给定，将使用与 css 选择器匹配的第一个表单。

formnumber (int) – 当响应包含多个表单时要使用的表单数。第一个（也是默认值）是 0。

formdata (dict) – 要在表单数据中覆盖的字段。如果一个字段已经存在于响应 <form> 元素中，它的值会被传入这个参数的值覆盖。如果此参数中传递的值为 None，则该字段将不会包含在请求中，即使它存在于响应 <form> 元素中。

clickdata (dict) – 用于查找点击控件的属性。如果没有给出，表单数据将被提交，模拟点击第一个可点击元素。除了 html 属性之外，控件还可以通过 nr 属性通过其相对于表单内其他可提交输入的从零开始的索引来标识。

dont_click (bool) – 如果为 True，表单数据将被提交而不点击任何元素。

    @classmethod
    def from_response(cls, response, formname=None, formid=None, formnumber=0, formdata=None,
                      clickdata=None, dont_click=False, formxpath=None, formcss=None, **kwargs):

        kwargs.setdefault('encoding', response.encoding)

        if formcss is not None:
            from parsel.csstranslator import HTMLTranslator
            formxpath = HTMLTranslator().css_to_xpath(formcss)

        form = _get_form(response, formname, formid, formnumber, formxpath)
        formdata = _get_inputs(form, formdata, dont_click, clickdata, response)
        url = _get_form_url(form, kwargs.pop('url', None))

        method = kwargs.pop('method', form.method)
        if method is not None:
            method = method.upper()
            if method not in cls.valid_form_methods:
                method = 'GET'

        return cls(url=url, method=method, formdata=formdata, **kwargs)

class： scrapy.http.FormRequest(url[, formdata, ...])

FormRequest 类向 __init__ 方法添加了一个新的关键字参数。其余参数与 Request 类相同，此处未记录。

Parameters
formdata (dict or collections.abc.Iterable) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.

参数
formdata（dict 或 collections.abc.Iterable）——是一个字典（或（键，值）元组的迭代），包含 HTML 表单数据，这些数据将被 url 编码并分配给请求的主体。

几个requests用例：

使用 FormRequest 通过 HTTP POST 发送数据：
如果需要在爬虫中模拟一个 HTML 表单 POST 并发送几个键值字段，可以像这样返回一个 FormRequest 对象（来自我们的爬虫）：

return [FormRequest(url="http://www.example.com/post/action",
                    formdata={'name': 'John Doe', 'age': '27'},
                    callback=self.after_post)]

使用 FormRequest.from_response() 模拟用户登录

网站通常通过 <input type="hidden"> 元素提供预填充的表单字段，例如会话相关数据或身份验证令牌（用于登录页面）。抓取时，我们需要自动预填充这些字段，并且只覆盖其中的几个字段，例如用户名和密码。可以使用 FormRequest.from_response() 方法。一个使用它的示例：

import scrapy

def authentication_failed(response):
    # TODO: Check the contents of the response and return True if it failed
    # or False if it succeeded.
    pass

class LoginSpider(scrapy.Spider):
    name = 'example.com'
    start_urls = ['http://www.example.com/users/login.php']

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formdata={'username': 'john', 'password': 'secret'},
            callback=self.after_login
        )

    def after_login(self, response):
        if authentication_failed(response):
            self.logger.error("Login failed")
            return

        # continue scraping with authenticated session...

及锋而试

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Scarpy源码分析 16 Requests and Responses Ⅴ

类方法classmethodfrom_response(response[,formname=None,formid=None,formnumber=0,formdata=None,formxpath=None,formcss=None,clickdata=None,dont_click=False,...])
复制链接

扫一扫