第一步:建立scrapy爬虫工程
D:\tmp>scrapy startproject pythondemo
第二步:在工程中产生一个爬虫
D:\tmp\pythondemo>scrapy genspider demo python123.io
第三步:配置产生的spider爬虫
# -*- coding: utf-8 -*-
import scrapy
class DemoSpider(scrapy.Spider):
name = 'demo'
#allowed_domains = ['python123.io']
start_urls = ['http://python123.io/ws/demo.html']
def parse(self, response):
fname=response.url.split('/')[-1]#响应的url的名字为本地文件名
with open(fname,'wb') as f:
f.write(response.body)
self.log('Saved file %s.'% name)
pass
第四步:运行爬虫
D:\tmp\pythondemo>scrapy crawl demo