scrapy 爬到的response是一个json
{
"code": "1",
"data": "<div> <p class='one'> <a href='/people/yang' class='zg-link'>杨</a></p> <p class='two'> <a href='/people/wang' class='zg-link'>王</a></p> </div>",
}
1:得到json中对应的数据
class MySpider(BaseSpider):
...
def parse(self, response):
jsonresponse = json.loads(response.body_as_unicode())
item = MyItem()
item["firstName"] = jsonresponse["firstName"]
return item
2:分析json中的html数据得到url
from scrapy.selector import Selector
以文字构造:
body = '<html><body><span>good</span></body></html>'
Selector(text=body).css('//span/text()').extract()
[u'good']
参考资料
http://stackoverflow.com/questions/18171835/scraping-a-json-response-with-scrapy
http://scrapy-chs.readthedocs.io/zh_CN/latest/topics/selectors.html