在本地编写好爬虫,就想把程序上传阿里云服务器,可是安装scrapy,却出现好多问题,本文以作者遇到的问题,提供解决方案,成功安装好scrapy
scrapy也是托管到pip仓库,但是需要依赖lxml和twisted库,常常不能通过pip install安装成功。一般安装,可以先保证安装好依赖库,再pip安装scrapy。
安装twisted
twisted用的是python2.7的环境,如果python3.x需要自己选择python3.x相关的twisted。unbuntu14.4默认是python2.7.x,直接使用pip安装。
pip install twisted
安装lxml
lxml安装在国外ip的阿里云是可以安装好的,在国内有时会链接出错,有时还会跟不同阿里云环境不同而出错原因多种多样。为了保险,这里采用源码安装lxml。lxml有时依赖,zlib1g,可以先安装一下zlib1g
apt-get install zlib1g-dev
wget http://pypi.python.org/packages/source/l/lxml/lxml-3.4.2.tar.gz
tar xzf lxml-3.4.2.tar.gz
cd lxml-3.4.2/
python setup.py install
安装scrapy
为了顺利安装scrapy,可以先安装一下依赖的库。
1. 安装libffi
apt-get install libffi
2 安装cffi
pip install cffi
3 安装cryptography
apt-get install build-essential libssl-dev libffi-dev python-dev
pip install cryptography --force-reinstall
安装完依赖库,再安装scrapy,基本没有问题。
pip install scrapy
安装进行测试
1 shell下输入scrapy,输出
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
commands
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
2 在python下测试
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from scrapy import Spider, Request
没有错误,代表scrapy已经安装正确