取自
Scrapy终端(Scrapy shell)
#判断 url是否是想要的
def parse(self, response):
if ".org" in response.url:
from scrapy.shell import inspect_response #调试语句
inspect_response(response, self)
>>> response.url
'http://example.org'
测试提取代码:
>>> sel.xpath('//h1[@class="fn"]')
[]
浏览器打开链接
>>> view(response)
True
最后您可以点击Ctrl-D(Windows下Ctrl-Z)来退出终端,恢复爬取:
>>> ^D2014-01-23 17:50:03-0400 [myspider] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
在浏览器中打开URL
from scrapy.utils.response import open_in_browser
def parse(self, response):
if "item name" not in response.body:
open_in_browser(response)