scrapy_TypeError: Cannot convert unicode body - HtmlResponse has no encoding

 问题描述:

在使用middleware进的时候,计划是在scrapy发送请求的时候对其进行拦截,然后自己使用HtmlResponse伪造一个response响应进行返回,传给scrapy调度器。但是,在使用HtmlResponse实例化对象的是时候报错,如下:

Traceback (most recent call last):
  File "e:\anaconda3\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "e:\anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 37, in process_request
    response = yield method(request=request, spider=spider)
  File "E:\Scrapy\Jianshu\Jianshu\middlewares.py", line 128, in process_request
    response = HtmlResponse(url=self.browser.current_url, body=url_src, request=request)
  File "e:\anaconda3\lib\site-packages\scrapy\http\response\text.py", line 31, in __init__
    super(TextResponse, self).__init__(*args, **kwargs)
  File "e:\anaconda3\lib\site-packages\scrapy\http\response\__init__.py", line 22, in __init__
    self._set_body(body)
  File "e:\anaconda3\lib\site-packages\scrapy\http\response\text.py", line 47, in _set_body
    type(self).__name__)
TypeError: Cannot convert unicode body - HtmlResponse has no encoding

代码如下:

class JianshuSeleniumDownloaderMiddleware(object):
    def __init__(self):
        self.browser = webdriver.Chrome('E:\Scrapy\chromedriver.exe')

    def process_request(self, request, spider):
        self.browser.get(request.url)
        try:
            while True:
                show_more = self.browser.find_element_by_class_name("show-more")
                time.sleep(1)
                if show_more:
                    show_more.click()
                else:
                    break
        except:
            pass

        url_src = self.browser.page_source
        print(url_src)
        response = HtmlResponse(url=self.browser.current_url, body=url_src, request=request)
        return response

问题分析:

HtmlResponse 没有encoding

报错的错误信息可以找到源代码,所以我找到源代码进行捋逻辑,源代码如下:

    def _set_body(self, body):
        self._body = b''  # used by encoding detection
        if isinstance(body, six.text_type):
            if self._encoding is None:
                raise TypeError('Cannot convert unicode body - %s has no encoding' %
                    type(self).__name__)
            self._body = body.encode(self._encoding)
        else:
            super(TextResponse, self)._set_body(body)

错误信息: Cannot convert unicode body - HtmlResponse has no encoding

所以如下代码中,self._encoding为空

if self._encoding is None:

所以在使用HtmlResponse创建对象的时候将enciding写入

修改后代码如下:

response = HtmlResponse(url=self.browser.current_url, body=url_src, request=request, encoding="utf-8")

运行,不再报错

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值