scrapy爬虫实战（部分源代码）

最新推荐文章于 2024-08-26 18:04:13 发布

小陈同学_666

最新推荐文章于 2024-08-26 18:04:13 发布

阅读量599

点赞数 5

文章标签： scrapy 爬虫

本文链接：https://blog.csdn.net/QAZWCC/article/details/138009641

版权

该文章展示了如何使用Scrapy框架创建一个名为SpiderTitleSpider的爬虫，从zongheng.com抓取章节标题，并将它们写入我有一剑.txt文件。主要涉及`Sss1Item`定义和XPath查询的使用。

摘要由CSDN通过智能技术生成

items.py

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class Sss1Item(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()

spider_title.py

import scrapy
from sss1.items import Sss1Item
class SpiderTitleSpider(scrapy.Spider):
    name = "spider_title"
    allowed_domains = ["www.zongheng.com"]
    start_urls = ["https://read.zongheng.com/chapter/1215341/68208370.html"]

    def parse(self, response):
        item=Sss1Item()
        f=open('我有一剑.txt','a',encoding='utf8')
        titles=response.xpath('//*[@id="Jcontent"]/div/div[1]/div[2]/text()').extract()
        for asd in titles:
            f.write(asd+"\n")

        names=[each.extract() for each in response.xpath('//*[@id="Jcontent"]/div/div[4]/p[3]/span[1]/text()')]
        # for asd in names:
        #     f.write(asd+"\n")
        item['name']=names
        yield item
        next=response.xpath('//*[@id="page_reader"]/div[3]/div[1]/div[3]/div[1]/a[3]/@href').get()
        next=next.replace("?","")
        print('----------------------------------------------------------')
        print(next)
        if next:
            yield scrapy.Request(url=next,callback=self.parse)