参考了官方文档,链接https://scrapy-chs.readthedocs.io/zh_CN/latest/topics/images.html
处理文本的pipeline参考了这篇博客https://blog.csdn.net/killeri/article/details/80228089
items.py:
import scrapy
class XicidailispiderItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
image_paths=scrapy.Field()
photographer=scrapy.Field()
爬虫文件:
class PexelcrawlSpider(scrapy.Spider):
name = 'pexelCrawl'
allowed_domains = ['pexels.com']
start_urls = ['https://www.pexels.com/']
def parse(self, response):
selectors = res