笔记——scrapy 抓取图片

今天来学习一下使用scrapy对图片的进行抓取

1. 创建项目

scrapy startproject xiaohuascrapy

创建spider文件,取名xiaohua.py

2.定义 Item

import scrapy
from scrapy.item import Item, Field
class XiaohuascrapyItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    file_urls = scrapy.Field()
    files = scrapy.Field()

3.编写spider文件

# -*- coding: utf-8 -*-
import scrapy
from xiaohuascrapy.items import XiaohuascrapyItem

words = '张馨予'

class XiaohuaSpider(scrapy.Spider):
    name = "xiaohua"
    allowed_domains = ["baidu.com"]
    custom_settings = {#重写存储路径
        'FILES_STORE' : '/图片/baidu/%s' % words
    }
    pn = 0

    def __init__(self , keywords = '' , *args , **kwargs):
        super(XiaohuaSpider , self).__init__(*args , **kwargs)
        self.url = 'http://image.baidu.com/search/flip?tn=baiduimage&word=' + words
        self.start_urls = [
            self.url
        ]

    def parse(self, response):
        item = XiaohuascrapyItem()
        item['file_urls'] = response.selector.re(r'''"objURL":"(http://[^"]+?)"''')
        yield item
        self.pn += 20
        yield scrapy.Request('%s%s%d' % (self.url , '&pn=' , self.pn) , self.parse)

4.设置setting文件

BOT_NAME = 'xiaohuascrapy'

SPIDER_MODULES = ['xiaohuascrapy.spiders']
NEWSPIDER_MODULE = 'xiaohuascrapy.spiders'
USER_AGENTS = [
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
    ]

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'xiaohuascrapy (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False
COOKIES_ENABLED = False
ITEM_PIPELINES = {
    'scrapy.pipelines.files.FilesPipeline': 100,
}
LOG_LEVEL = 'DEBUG'

到这里,基本就结束了,运行项目

scrapy crawl xiaohua
就会在磁盘根目录下生成一个图片文件夹,打开就可以看到图片了。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值