一.理论逻辑图
二.创建项目
scrapy startproject mycrawl
cd mycrawl
scrapy genspider -t crawl mycrawlspider sohu.com
#codoing:utf-8 import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule class WeisuenSpider(CrawlSpider): name = 'mycrawlspider' allowed_domains = ['sohu.com'] start_urls = ['http://sohu.com/'] rules = ( Rule(LinkExtractor(allow=(r'.shtml',),allow_domains=('sohu.com',)), callback='parse_item', follow=True), ) def parse_item(self, response): print response.url注意follow=True 代表会一直追踪下去,follow=False一次循环就结束