一、创建项目、spider,item以及配置setting
创建项目:scrapy startproject nitu
创建爬虫:scrapy genspider -t basic nituwang nipic.com
写个item:
# -*- coding: utf-8 -*-
import scrapy
class NituItem(scrapy.Item):
url = scrapy.Field()
配置setting(重要!):
1.首先打开User-Agent(反爬虫策略):
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
2.ROBOTSTXT协议改为False(主要是为了顺利获取图片)
3.打开pipeline: