使用Scrapy爬取豆瓣某影星的所有个人图片
以莫妮卡·贝鲁奇为例
一:首先我们在命令行进入到我们要创建的目录,输入 scrapy startproject banciyuan
创建scrapy项目
创建的项目结构如下
二:为了方便使用pycharm执行scrapy项目,新建main.py
from scrapy import cmdline
cmdline.execute("scrapy crawl banciyuan".split())
再edit configuration
然后进行如下设置,设置后之后就能通过运行main.py运行scrapy项目了
三:分析该HTML页面,创建对应spider
from scrapy import Spider
import scrapy
from banciyuan.items import BanciyuanItem
class BanciyuanSpider(Spider):
name = 'banciyuan'
allowed_domains = ['movie.douban.com']
start_urls = ["https://movie.douban.com/celebrity/1025156/photos/"]
url = "https://movie.douban.com/celebrity/1025156/photos/"
def parse(self, response):
num = response.xpath('//div[@class="paginator"]/a[last()]/text()').extract_first('')
print(num)
for i in range(int(num)):
suffix = '?type=C&start=' + str(