需求分析
需求:爬取斗鱼主播图片,并下载到本地
思路:
- 使用Fiddler抓包工具,抓取斗鱼手机APP中的接口
- 使用Scrapy框架的ImagesPipeline实现图片下载
ImagesPipeline实现图片下载的使用方法:
- 在items中的XxxItem中定义 image_urls 和 images字段
- 在spider中将提取出来的图片链接保存到Item的 image_urls 字段中(注意:该字段接收一个可迭代对象,否则报错)
- 在settings文件中进行配置,具体配置见 settings.py 文件
报错 : ValueError: Missing scheme in request url: h
Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python3.5/dist-packages/scrapy/pipelines/media.py", line 79, in process_item requests = arg_to_iter(self.get_media_requests(item, info)) File "/usr/local/lib/python3.5/dist-packages/scrapy/pipelines/images.py", line 155, in get_media_requests return [Request(x) for x in item.get(self.images_urls_field, [])] File "/usr/local/lib/python3.5/dist-packages/scrapy/pipelines/images.py", line 155, in return [Request(x) for x in item.get(self.images_urls_field, [])] File "/usr/local/lib/python3.5/dist