Scrapy爬虫框架
1.Scrapy爬虫框架介绍
scrapy的安装:
1.安装wheel
pip install wheel
2.安装pywin32
pip install pywin32
3.安装twisted
从Python Extension Packages for Windows 点击对应版本下载,cmd进入下载目录,执行pip install Twisted-18.4.0-cp36-cp36m-win_amd64.whl即可安装
pip install Twisted-18.9.0-cp36-cp36m-win_amd64.whl
4.安装wheel
pip install wheel
5.安装scrapy
pip install scrapy
6.测试
(py3MachineLearning) D:\Python网络爬虫与信息提取>scrapy version
Scrapy 1.6.0
(py3MachineLearning) D:\Python网络爬虫与信息提取>scrapy -h
Scrapy 1.6.0 - no active project
Usage:
scrapy [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy -h" to see more info about a command
2.Scrapy爬虫框架解析
3.Requests库和Scrapy爬虫比较
相同点:两者都可以进行页面请求和爬取,Python爬虫的两个重要技术路线
两者可用性都好,文档丰富,入门简单
两者都没有处理js、提交表单、应对验证码等功能(可扩展)
选用哪个技术路线开发爬虫呢?
看情况:非常小的需求,requests库
不太小的需求,Scrapy框架
定制程度很高的需求(不考虑规模),自搭框架,requests > Scrapy
4.scrapy爬虫的常用命令
Scrapy的命令行格式:
Scrapy的常用命令:
为什么Scrapy采用命令行创建和运行爬虫?命令行(不是图形界面)更容易自动化,适合脚本控制
本质上,Scrapy是给程序员用的,功能(而不是界面)更重要
5.小结