用scrapy爬取GIF图

最新推荐文章于 2022-07-19 12:00:00 发布

Lee007008

最新推荐文章于 2022-07-19 12:00:00 发布

阅读量1.2k

点赞数

分类专栏： python scrapy 爬虫文章标签： Python scrapy 爬虫 GIF图

本文链接：https://blog.csdn.net/qaz2170/article/details/61417514

版权

python 同时被 3 个专栏收录

2 篇文章 0 订阅

订阅专栏

scrapy

2 篇文章 0 订阅

订阅专栏

爬虫

2 篇文章 0 订阅

订阅专栏

本篇内容与上一篇大致一致，主要不同的地方为pipelines.py，因为ImagesPipeline不支持GIF格式，因此我们需要重构保存图片方法。

一、items.py

import scrapy


class HupuGifItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    hupu_image_url = scrapy.Field()
    images = scrapy.Field()

二、pipelines.py

# -*- coding: utf-8 -*-

from scrapy.pipelines.images import ImagesPipeline
from hupu_gif import settings
import requests
import os

class HupuGifPipeline(ImagesPipeline):
def process_item(self, item, spider):
if 'hupu_image_url' in item:
images = []

dir_path = '%s/%s' % (settings.IMAGES_STORE, spider.name)
if not os.path.exists(dir_path):
os.makedirs(dir_path)

for image_url in item['hupu_image_url']:
us = image_url.split('/')[-1]
file_path = '%s/%s' % (dir_path, us)
images.append(file_path)
if os.path.exists(file_path):
continue
with open(file_path, 'wb') as handle:
response = requests.get('http:'+image_url, stream=True)
for block in response.iter_content(1024):
if not block:
break
handle.write(block)

item['images'] = images
return item