初学爬虫-笔趣阁单本小说爬取（使用selenium）

最新推荐文章于 2024-05-08 21:32:09 发布

LongYuTianTang

最新推荐文章于 2024-05-08 21:32:09 发布

阅读量549

点赞数 2

分类专栏：爬虫小案例

本文链接：https://blog.csdn.net/LongYuTianTang/article/details/107963394

版权

爬虫小案例专栏收录该内容

10 篇文章 1 订阅

订阅专栏

# 导包
from selenium import webdriver
import re
dr = webdriver.Chrome()
# 获取每页小说内容的函数
def get_page():
    novel={}
    novel['page_title'] = dr.find_element_by_xpath('//*[@id="BookCon"]/h1').text
    novel['page_content'] = dr.find_element_by_xpath('//*[@id="BookText"]').text
    return novel

url = 'https://www.2wxs.com/xstxt/312/118706.html'  # 小说第一张的链接
dr.get(url) # 打开第一张小说链接 
# 获取小说列表页url
list_url = dr.find_element_by_xpath('//*[@id="BookCon"]/div[1]/a[2]').get_attribute("href")  # //*[@id="BookCon"]/div[1]/a[2]
print(list_url)

with open('斗罗大陆1.txt','a+',encoding='utf-8')as f:
    while url != list_url:  # 当url 不等于列表页url的时候一直循环。
        content=get_page()
        next_url = dr.find_element_by_xpath('//*[@id="BookCon"]/div[1]/a[3]')
        url = next_url.get_attribute("href")
        next_url.click()
        # 写入文件
        f.write(content['page_title'] + '\n')
        f.write(content['page_content'] + '\n')
        print(content)

dr.close()

在这里插入图片描述

LongYuTianTang

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
初学爬虫-笔趣阁单本小说爬取（使用selenium）

# 导包from selenium import webdriverimport redr = webdriver.Chrome()# 获取每页小说内容的函数def get_page(): novel={} novel['page_title'] = dr.find_element_by_xpath('//*[@id="BookCon"]/h1').text novel['page_content'] = dr.find_element_by_xpath('//*[@id=
复制链接

扫一扫