除了抓取第一页外,抓取第2,3等下一页
参考:Python + Scrapy 抓取豆瓣电影 top 250
http://www.jianshu.com/p/62e0a588ee0d
# 翻页
next_page = response.xpath('//span[@class="next"]/a/@href')
if next_page:
url = response.urljoin(next_page[0].extract())
yield scrapy.Request(url, self.parse)
如果下一页是js生成的,可以使用scrapy+selenium(慢)
参考:
selenium with scrapy for dynamic page
http://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page
如果下一页是js生成的,可以使用ScrapyJS
Scraping dynamic content using python-Scrapy
http://stackoverflow.com/questions/30345623/scraping-dynamic-content-using-python-scrapy
Scrapy爬虫中使用Splash处理页面JS
http://ae.yyuap.com/pages/viewpage.action?pageId=919763