python自动抓取_Python Scrapy 自动抓取下一页内容

最新推荐文章于 2021-02-10 13:30:30 发布

weixin_39714307

最新推荐文章于 2021-02-10 13:30:30 发布

阅读量163

点赞数

文章标签： python自动抓取

最近在学下Scrapy，抓取下一页的时候遇到了问题

import scrapy

from crawlAll.items import CrawlallItem

class ToutiaoEssayJokeSpider(scrapy.Spider):

name = "duanzi"

allowed_domains = ["http://duanziwang.com"]

start_urls = ['http://duanziwang.com/category/duanzi/page/1']

def parse(self, response):

for sel in response.xpath("//article[@class='excerpt excerpt-nothumbnail']"):

item = CrawlallItem()

item['Title'] = sel.xpath("//header/h2/a/text()").extract_first()

item['Text'] = sel.xpath("//p[@class='note']/text()").extract_first()

item['Views'] = sel.xpath("//p[1]/span[@class='muted'][2]/text()").extract_first()

item['Time'] = sel.xpath("//p[1]/span[@class='muted'][1]/text()").extract_first()

yield item

next_page = response.xpath("//ul/li[@class='next-page']/a/@href").extract_first()

if next_page is not None:

next_page = response.urljoin(next_page)

yield scrapy.Request(next_page, callback=self.parse)

具体代码如上，我只能抓取第一页的12条内容，第二页的连接我用print的时候也能打印出来，说明连接是获取成功了，就是：

next_page = response.urljoin(next_page)

yield scrapy.Request(next_page, callback=self.parse)

这两句代码没有回过头去调用parse，不知道为什么？请大神帮忙看看，谢谢了。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注