立即学习:https://edu.csdn.net/course/play/24756/283289?utm_source=blogtoedu
在parse中,一页提取多个相同位置的内容的时候,用xpath可以先获取代码块,然后用遍历的方式获取自己需要的内容,如果直接提取每个内容,就会在PIP管道中输出的是每个内容的列表,不能形成完整的内容,而是相同的内容成为了列表。scrapy.Request(url),再用yield返回,就可以重新在parse中读取内容。适用于普通的翻页。
gushiwens =response.xpath('//*[@class ="sons"]')
for gushiwen in gushiwens:
title = gushiwen.xpath('.//b/text()').getall()
source =gushiwen.xpath('.//p[@class="source"]/a/text()').getall()
dynasty =source[0]
autor =source[1]
neirong =gushiwen.xpath('.//div[@class ="contson"]//text()').getall()
neirong =''.join(neirong).strip()
item =ExmpleItem(title = title,dynasty =dynasty,autor =autor,neirong =neirong)
yield item
next_href=response.xpath('//a[@href ="amore"]/@href')
next_url =response.urljoin(next_href)
request = scrapy.Request(next_url)
yield request