Creating a project
Before you start scraping, you will have to set up a new Scrapy project. Enter a directory where you’d like to store your code and run:
scrapy startproject URLCrawler
Our first Spider
This is the code for our first Spider. Save it in a file named my_spider.py
under the URLCrawler/spiders
directory in your project:
import scrapy
class MySpider(scrapy.Spider):
name = "my_spider"
def start_requests(self):
urls = [
'http://www.4g.haval.com.cn/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
domain = response.url.split("/")[-2]
filename = '%s.html' %domain
#写文件调用open()函数时,传入标识符'w'或者'wb'表示写