Scrapy ImagesPipeline下载图片

项目源码下载:http://download.csdn.net/download/adam_zs/10166641

1.项目结构,下载图片截图



2.项目简介

settings.py

ITEM_PIPELINES = {
    # 'scrapy.pipelines.images.ImagesPipeline': 1
    "ImagesPipelineTest.pipelines.MyImagesPipeline":1
}
IMAGES_STORE = 'E:\\shetuwang2017'

items.py

import scrapy


class ImageItem(scrapy.Item):
    image_urls = scrapy.Field()
    images = scrapy.Field()
# image_urls和images是固定的

she_tu_wang.py

# -*- coding: utf-8 -*-
import scrapy
from ImagesPipelineTest.items import ImageItem


class XiaohuaSpider(scrapy.Spider):
    name = "shetuwang"
    allowed_domains = ["699pic.com"]
    start_urls = ['http://699pic.com/people.html']
    download_delay = 2

    def parse(self, response):
        item = ImageItem()
        srcs = response.xpath('//div[@class="swipeboxEx"]/div[@class="list"]/a/img/@data-original').extract()
        item['image_urls'] = srcs
        yield item

pipelines.py

from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request


class MyImagesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield Request(image_url)

    def item_completed(self, results, item, info):
        image_path = [x['path'] for ok, x in results if ok]
        if not image_path:
            raise DropItem('Item contains no images')
        item['image_paths'] = image_path
        return item

3.运行项目

pycharm中运行begin.py

from scrapy import cmdline

# cmdline.execute("scrapy crawl dmoz".split())

cmdline.execute("scrapy crawl shetuwang".split())


  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值