报错内容如下:
2019-09-27 13:32:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://movie.douban.com/robots.txt> (referer: None)
2019-09-27 13:32:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://movie.douban.com/top250> (referer: None)
2019-09-27 13:32:18 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://movie.douban.com/top250>: HTTP status code is not handled or not allowed
403为访问被拒绝,问题出在我们的USER_AGENT上。
解决办法:
打开我们要爬取的网站,打开控制台,找一个请求看看:
复制这段