Scrapy shell使用

注意:容易出现403错误,实际爬取时不会出现。
response  - a  Response  object containing the last fetched page
>>> response . xpath ( '//title/text()' ) . extract ()
 return a list of selectors
>>>for index, link in enumerate ( links ):
... args = ( index , link . xpath ( '@href' ) . extract (), link . xpath ( 'img/@src' ) . extract ()) ... print 'Link number %d points to url %s and image %s' % args
Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg'] Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg'] Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg'] Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg'] Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
enumerate() 函数一般用在 for 循环当中。
普通的 for 循环
>>> i = 0 >>> seq = [ ' one ' , ' two ' , ' three ' ] >>> for element in seq : ... print i , seq [ i ] ... i +=1 ... 0 one 1 two 2 three
for 循环使用 enumerate
>>> seq = [ ' one ' , ' two ' , ' three ' ] >>> for i, element in enumerate ( seq ) : ... print i , seq [ i ] ... 0 one 1 two 2 three
suppose you want to extract all  <p>  elements inside  <div>  elements. First, you would get all  <div>  elements:
>>> divs = response . xpath ( '//div' )
note the dot prefixing the  .//p  XPath):
>>> for p in divs . xpath ( './/p' ): # extracts all <p> inside ... print p . extract ()
Another common case would be to extract all direct  <p>  children:
>>> for p in divs . xpath ( 'p' ): ... print p . extract ()
在程序中使用shell
from scrapy.shell import inspect_response inspect_response ( response , self )
Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the crawling:
xpath最外层最好用单引号!
shell 本地html,方便 调试(但别取名为index.html)
scrapy shell ./ path / to / file . html ,即使在本目录,也必须要加./,不能直接 shell file.html scrapy shell ../ other / path / to / file . html scrapy shell / absolute / path / to / file . html
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值