问题描述:
在使用scrapy进行爬虫的时候,启用前没有问题,启用管道后就出现了问题。
报错如下:
AttributeError: type object 'ImagesPipeline' has no attribute 'startswith'
无法在python中的SCRAPY中创建对'str'对象的弱引用
spider文件代码如下:
# -*- coding: utf-8 -*-
import scrapy
from Bmw.items import BmwItem
class BmySpider(scrapy.Spider):
name = 'bmy'
allowed_domains = ['XXXXXXXXXX']
start_urls = ['https://XXXXXXXXX/pic/series/65.html']
def parse(self, response):
uibox_list = response.xpath("//div[@class='uibox']")[1:]
for uibox in uibox_list:
# 图片标题
uibox_title = uibox.xpath(".//div[@class='uibox-title']/a/text()").get()
# print(uibox_title)
# 图片链接
# uibox_img_list = uibox.xpath(".//div[@class='uibox-con carpic-list03']//img/@src").getall()
uibox_img_list = uibox.xpath(".//div[contains(@class,'uibox-con carpic-list03')]//img/@src").getall()
# 利用列表推导式对每个元素进行url地址拼接
# uibox_img_list = ["http:" + i for i in uibox_img_list]
# map函数对url地址进行拼接
uibox_img_list = list(map(lambda x: response.urljoin(x) if "http" not in x else x, uibox_img_list))
item = BmwItem(uibox_title=uibox_title, image_urls=uibox_img_list)
yield item
Pipelines文件代码:
ITEM_PIPELINES = {
Bmw.pipelines.ImagePipeline: 300,
}
原因分析:
启用前没有问题,启用后出了问题,但是在管道内加了print并没有打印出来,也就是说,程序并没有进入到管道中执行,先查看settings中管道配置信息是否有误
果然,配置文件中没有加引号!!!!!
修改后代码:
ITEM_PIPELINES = {
'Bmw.pipelines.ImagePipeline': 300,
}
再次执行,成功执行!!