下载python3.7:https://www.python.org/downloads/release/python-371/
安装 --点下一步下一步-->ok
运行:
pip install Scrapy
报错: error:信息,发现安装前需要有Visual C++
去:https://www.lfd.uci.edu/~gohlke/pythonlibs/ 下载: Twisted-18.9.0-cp37-cp37m-win_amd64.whl 文件
运行:
pip install Twisted-18.9.0-cp37-cp37m-win_amd64.whl
更新pip:
python -m pip install --upgrade pip
安装Scrapy:
pip install Scrapy
安装 pycharm
新建 项目:
file-->setting
配置Project interpreter 指到本机的安装 路径.
右侧加号:添加Scrapy 插件
生成项目:
scrapy startproject tutorial
pip install pypiwin32 安装 pypiwin32 解决 No module named 'win32api'
接下来,创建一个 ProItem 类,和构建 item 模型(model)。
import scrapy
class ProItem(scrapy.Item):
name = scrapy.Field()
title = scrapy.Field()
info = scrapy.Field()
pass
###在mySpider/spider目录下创建一个名为itcast的爬虫
scrapy genspider itcast "itcast.cn"
生成爬虫: 修改成:
import scrapy
class ItcastSpider(scrapy.Spider):
name = 'itcast'
allowed_domains = ['itcast.cn']
start_urls = ['http://www.itcast.cn/channel/teacher.shtml']
start_urls = (
'http://www.itcast.cn',
)
def parse(self, response):
filename = "teacher.html"
open(filename, 'wb+').write(response.body)
运行:
scrapy crawl itcast
生成一个HTML文件