Scrapy爬虫基础
第1关:Scarpy安装与项目创建
pip install scrapy
cd /root
scrapy startproject HelloWorld
cd HelloWorld
scrapy genspider world www.baidu.com
第2关:Scrapy核心原理
# -*- coding: utf-8 -*-
import scrapy
class WorldSpider(scrapy.Spider):
name = 'world'
allowed_domains = ['www.baidu.com']
start_urls = ['http://www.baidu.com/']
def parse(self, response):
# ********** Begin *********#
# 将获取网页源码本地持久化
filename = "./baidu.html"
with open(filename,'wb') as f:
f.write(response.body)
# ********** End *********#