scrapy

  1. 安装Scrapy
    pip install scrapy
[root@localhost wuxiaobing]# pip install scrapy
Collecting scrapy
  Downloading Scrapy-1.4.0-py2.py3-none-any.whl (248kB)
    100% |████████████████████████████████| 256kB 716kB/s 
Collecting service-identity (from scrapy)
  Downloading service_identity-17.0.0-py2.py3-none-any.whl
Collecting cssselect>=0.9 (from scrapy)
  Downloading cssselect-1.0.1-py2.py3-none-any.whl
Collecting parsel>=1.1 (from scrapy)
  Downloading parsel-1.2.0-py2.py3-none-any.whl
Collecting Twisted>=13.1.0 (from scrapy)
  Downloading Twisted-17.5.0.tar.bz2 (3.0MB)
    100% |████████████████████████████████| 3.0MB 150kB/s 
Collecting PyDispatcher>=2.0.5 (from scrapy)
  Downloading PyDispatcher-2.0.5.tar.gz
Requirement already satisfied: pyOpenSSL in /root/anaconda3/lib/python3.6/site-packages (from scrapy)
Requirement already satisfied: lxml in /root/anaconda3/lib/python3.6/site-packages (from scrapy)
Collecting queuelib (from scrapy)
  Downloading queuelib-1.4.2-py2.py3-none-any.whl
Collecting w3lib>=1.17.0 (from scrapy)
  Downloading w3lib-1.17.0-py2.py3-none-any.whl
Requirement already satisfied: six>=1.5.2 in /root/anaconda3/lib/python3.6/site-packages (from scrapy)
Collecting pyasn1 (from service-identity->scrapy)
  Downloading pyasn1-0.3.1-py2.py3-none-any.whl (61kB)
    100% |████████████████████████████████| 71kB 1.6MB/s 
Collecting pyasn1-modules (from service-identity->scrapy)
  Downloading pyasn1_modules-0.0.10-py2.py3-none-any.whl (60kB)
    100% |████████████████████████████████| 61kB 1.2MB/s 
Collecting attrs (from service-identity->scrapy)
  Downloading attrs-17.2.0-py2.py3-none-any.whl
Collecting zope.interface>=4.0.2 (from Twisted>=13.1.0->scrapy)
  Downloading zope.interface-4.4.2-cp36-cp36m-manylinux1_x86_64.whl (172kB)
    100% |████████████████████████████████| 174kB 1.0MB/s 
Collecting constantly>=15.1 (from Twisted>=13.1.0->scrapy)
  Downloading constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from Twisted>=13.1.0->scrapy)
  Downloading incremental-17.5.0-py2.py3-none-any.whl
Collecting Automat>=0.3.0 (from Twisted>=13.1.0->scrapy)
  Downloading Automat-0.6.0-py2.py3-none-any.whl
Collecting hyperlink>=17.1.1 (from Twisted>=13.1.0->scrapy)
  Downloading hyperlink-17.3.0-py2.py3-none-any.whl
Requirement already satisfied: cryptography>=1.7 in /root/anaconda3/lib/python3.6/site-packages (from pyOpenSSL->scrapy)
Requirement already satisfied: setuptools in /root/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg (from zope.interface>=4.0.2->Twisted>=13.1.0->scrapy)
Requirement already satisfied: idna>=2.1 in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: asn1crypto>=0.21.0 in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: packaging in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: cffi>=1.4.1 in /root/anaconda3/lib/python3.6/site-packages (from cryptography>=1.7->pyOpenSSL->scrapy)
Requirement already satisfied: pycparser in /root/anaconda3/lib/python3.6/site-packages (from cffi>=1.4.1->cryptography>=1.7->pyOpenSSL->scrapy)
Building wheels for collected packages: Twisted, PyDispatcher
  Running setup.py bdist_wheel for Twisted ... done
  Stored in directory: /root/.cache/pip/wheels/57/08/00/28a9a86f0ee9f54260fb5949aed2e69b0425e8a878757aa7ce
  Running setup.py bdist_wheel for PyDispatcher ... done
  Stored in directory: /root/.cache/pip/wheels/86/02/a1/5857c77600a28813aaf0f66d4e4568f50c9f133277a4122411
Successfully built Twisted PyDispatcher
Installing collected packages: pyasn1, pyasn1-modules, attrs, service-identity, cssselect, w3lib, parsel, zope.interface, constantly, incremental, Automat, hyperlink, Twisted, PyDispatcher, queuelib, scrapy
Successfully installed Automat-0.6.0 PyDispatcher-2.0.5 Twisted-17.5.0 attrs-17.2.0 constantly-15.1.0 cssselect-1.0.1 hyperlink-17.3.0 incremental-17.5.0 parsel-1.2.0 pyasn1-0.3.1 pyasn1-modules-0.0.10 queuelib-1.4.2 scrapy-1.4.0 service-identity-17.0.0 w3lib-1.17.0 zope.interface-4.4.2
[root@localhost wuxiaobing]# python
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:09:58) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scrapy
>>> exit()
  1. 创建工程
    scrapy startproject risk_one
[root@localhost big_data]# scrapy startproject risk_one
New Scrapy project 'risk_one', using template directory '/root/anaconda3/lib/python3.6/site-packages/scrapy/templates/project', created in:
    /home/wuxiaobing/Documents/big_data/risk_one

You can start your first spider with:
    cd risk_one
    scrapy genspider example example.com
[root@localhost big_data]# tree risk_one/
risk_one/
|-- risk_one
|   |-- __init__.py
|   |-- items.py
|   |-- middlewares.py
|   |-- pipelines.py
|   |-- __pycache__
|   |-- settings.py
|   `-- spiders
|       |-- __init__.py
|       `-- __pycache__
`-- scrapy.cfg
4 directories, 7 files

3.
scrapy.cfg: 项目的配置文件
tutorial/: 该项目的python模块。之后您将在此加入代码。
tutorial/items.py: 项目中的item文件.
tutorial/pipelines.py: 项目中的pipelines文件.
tutorial/settings.py: 项目的设置文件.
tutorial/spiders/: 放置spider代码的目录.
在spiders文件夹下创建一个自己的spider,用于数据。
scrapy.cfg是项目的配置文件。
settings.py用于设置请求的参数,使用代理,爬取数据后文件保存等。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值