17k小说网男生_小说分类_完结小说分类_免费小说分类-17K小说网爬取小说列表,并翻页,获取最新章节名称。
每页获取全部小说,并获取前3页最新章节的标题。
app.py
from typing import Iterable
import scrapy
from scrapy import Request
class AppSpider(scrapy.Spider):
name = "app"
allowed_domains = ["www.17k.com"]
start_urls = ["https://www.17k.com/all/book/2_0_0_0_0_0_0_0_1.html"]
def start_requests(self) -> Iterable[Request]:
max_page = 4
for i in range(1, max_page):
url = "https://www.17k.com/all/book/2_0_0_0_0_0_0_0_" + str(i) + ".html"
yield Request(url)
def parse(self, response):
links = response.xpath('//table//tr/td[4]/a/@href').extract()
for link in links:
link = "http:" + link
yield scrapy.Request(url=link, callback=self.parse_chapter)
def parse_chapter(self, response):
chapter = response.xpath('//*[@id="readArea"]/div[1]/h1/text()').get()
if chapter is None:
chapter = response.xpath('/html/body/div[3]/div/h1/text()').get()
print(chapter)
函数start_requests负责改变初始化链接start_urls,也就是翻页链接。函数yield scrapy.Request负责回调解析子页面里面的内容,就是最新章节的标题。
发现extract()等于getall(),get()等于extract()[0]。