- 安装Scrapy
pip install scrapy
[root@localhost wuxiaobing]# pip install scrapy
Collecting scrapy
Downloading Scrapy-1.4.0-py2.py3-none-any.whl (248kB)
100% |████████████████████████████████| 256kB 716kB/s
Collecting service-identity (from scrapy)
Downloading service_identity-17.0.0-py2.py3-none-any.whl
Collecting cssselect>=0.9 (from scrapy)
Downloading cssselect-1.0.1-py2.py3-none-any.whl
Collecting parsel>=1.1 (from scrapy)
Downloading parsel-1.2.0-py2.py3-none-any.whl
Collecting Twisted>=13.1.0 (from scrapy)
Downloading Twisted-17.5.0.tar.bz2 (3.0MB)
100% |████████████████████████████████| 3.0MB 150kB/s
Collecting PyDispatcher>=2.0.5 (from scrapy)
Downloading PyDispatcher-2.0.5.tar.gz
Requirement already satisfied: pyOpenSSL in /root/anaconda3/lib/python3.6/site-packages (from scrapy)
Requirement already satisfied: lxml in /root/anaconda3/lib/python3.6/site-packages (from scrapy)
Collecting queuelib (from scrapy)
Downloading queuelib-1.4.2-py2.py3-none-any.whl
Collecting w3lib>=1.17.0 (from scrapy)
Downloading w3lib-1.17.0-py2.py3-none-any.whl
Requirement already satisfied: six>=1.5.2 in /root/anaconda3/lib/python3.6/site-packages (from scrapy)
Collecting pyasn1 (from service-identity->scrapy)
Downloading pyasn1-0.3.1-py2.py3-none-any.whl (61kB)
100% |████████████████████████████████| 71kB 1.6MB/s
Collecting pyasn1-modules (from service-identity->scrapy)
Downloading pyasn1_modules-0.0.10-py2.py3-none-any.whl (60kB)
100% |████████████████████████████████| 61kB 1.2MB/s
Collecting attrs (from service-identity->scrapy)
Downloading attrs-17.2.0-py2.py3-none-any.whl
Collecting zope.interface>=4.0.2 (from Twisted>=13.1.0->scrapy)
Downloading zope.interface-4.4.2-cp36-cp36m-manylinux1_x86_64.whl (172kB)
100% |████████████████████████████████| 174kB 1.0MB/s
Collecting constantly>=15.1 (from Twisted>=13.1.0->scrapy)
Downloading constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from Twisted>=13.1.0->scrapy)
Downloading incremental-17.5.0-py2.py3-none-any.whl
Collecting Automat>=0.3.0 (from Twisted>=13.1.0->scrapy)
Downloading Automat-0.6.0-py2.py3-none-any.whl
Collecting hyperlink>=17.1.1 (from Twisted>=13.1.0->scrapy)
Downloading hyperlink-17.3.0-py2.py3-none-any.whl
Requirement already satisfied: cryptography>=1.7 in /root/anaconda3/lib/python3.6/site-packages (from pyOpenSSL->scrapy)
Requirement already satisfied: setuptools in /root/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg (from zope.interface>=4.0.2->Twisted>=13.1.0->scrapy)
Requirement already satisfied: idna>=2.1 in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: asn1crypto>=0.21.0 in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: packaging in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: cffi>=1.4.1 in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: pycparser in /root/anaconda3/lib/python3.6/site-packages (from cffi>=1.4.1->cryptography>=1.7->pyOpenSSL->scrapy)
Building wheels for collected packages: Twisted, PyDispatcher
Running setup.py bdist_wheel for Twisted ... done
Stored in directory: /root/.cache/pip/wheels/57/08/00/28a9a86f0ee9f54260fb5949aed2e69b0425e8a878757aa7ce
Running setup.py bdist_wheel for PyDispatcher ... done
Stored in directory: /root/.cache/pip/wheels/86/02/a1/5857c77600a28813aaf0f66d4e4568f50c9f133277a4122411
Successfully built Twisted PyDispatcher
Installing collected packages: pyasn1, pyasn1-modules, attrs, service-identity, cssselect, w3lib, parsel, zope.interface, constantly, incremental, Automat, hyperlink, Twisted, PyDispatcher, queuelib, scrapy
Successfully installed Automat-0.6.0 PyDispatcher-2.0.5 Twisted-17.5.0 attrs-17.2.0 constantly-15.1.0 cssselect-1.0.1 hyperlink-17.3.0 incremental-17.5.0 parsel-1.2.0 pyasn1-0.3.1 pyasn1-modules-0.0.10 queuelib-1.4.2 scrapy-1.4.0 service-identity-17.0.0 w3lib-1.17.0 zope.interface-4.4.2
[root@localhost wuxiaobing]# python
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:09:58)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scrapy
>>> exit()
- 创建工程
scrapy startproject risk_one
[root@localhost big_data]# scrapy startproject risk_one
New Scrapy project 'risk_one', using template directory '/root/anaconda3/lib/python3.6/site-packages/scrapy/templates/project', created in:
/home/wuxiaobing/Documents/big_data/risk_one
You can start your first spider with:
cd risk_one
scrapy genspider example example.com
[root@localhost big_data]# tree risk_one/
risk_one/
|-- risk_one
| |-- __init__.py
| |-- items.py
| |-- middlewares.py
| |-- pipelines.py
| |-- __pycache__
| |-- settings.py
| `-- spiders
| |-- __init__.py
| `-- __pycache__
`-- scrapy.cfg
4 directories, 7 files
3.
scrapy.cfg: 项目的配置文件
tutorial/: 该项目的python模块。之后您将在此加入代码。
tutorial/items.py: 项目中的item文件.
tutorial/pipelines.py: 项目中的pipelines文件.
tutorial/settings.py: 项目的设置文件.
tutorial/spiders/: 放置spider代码的目录.
在spiders文件夹下创建一个自己的spider,用于数据。
scrapy.cfg是项目的配置文件。
settings.py用于设置请求的参数,使用代理,爬取数据后文件保存等。