Python 爬虫实战 —— 爬取小说

debugBiubiubiu2000

已于 2023-09-20 22:03:24 修改

阅读量387

点赞数 2

分类专栏： # python 爬虫实战文章标签： python 爬虫开发语言

于 2023-09-20 22:02:38 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/2301_77659011/article/details/133103416

版权

python 爬虫实战专栏收录该内容

8 篇文章 1 订阅

订阅专栏

import requests
from bs4 import BeautifulSoup


def get_chapters():
    """
    获取小说章节链接
    :return:
    """
    root_url = "http://www.89wx.cc/17/17277/"  # 小说网站根目录
    r = requests.get(root_url)
    r.encoding = 'gbk'  # 查看小说网站的编码，为 gbk
    soup = BeautifulSoup(r.text, 'html.parser')

    links = []
    # 查看网页，得知小说章节都是放在 dd 标签中的 a 标签
    for dd in soup.find_all("dd"):
        link = dd.find("a")
        if not link:
            continue
        links.append(('http://www.89wx.cc' + link["href"], link.get_text()))
    return links


def get_chapter_content(url):
    """
    获取小说章节内容
    :param url:
    :return:
    """
    r = requests.get(url)
    r.encoding = 'gbk'
    soup = BeautifulSoup(r.text, "html.parser")
    text = soup.find("div", id="content").get_text()
    return text


novel_chapters = get_chapters()
for chapter in novel_chapters:
    url, title = chapter
    with open(title + ".txt", "w", encoding="utf-8") as f:
        f.write(get_chapter_content(url))
    # break

debugBiubiubiu2000

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Python 爬虫实战 —— 爬取小说

爬取小说
复制链接

扫一扫

专栏目录

debugBiubiubiu2000 CSDN认证博客专家 CSDN认证企业博客

码龄1年

103: 原创

31万+: 周排名

3万+: 总排名

3万+: 访问

: 等级

1362: 积分

250: 粉丝

313: 获赞

14: 评论

329: 收藏

私信

关注

热门文章

分类专栏

最新评论

爬虫实战——scrapy框架爬取多张图片
debugBiubiubiu2000: 这个是测试代码的时候不想真的爬取那么多加上的
爬虫实战——scrapy框架爬取多张图片
xiye111: 42行多了个break
机器学习——过拟合问题、正则化解决法
ha_lydms: 作者写作的风格非常吸引人，让人不由自主地一口气读完整篇文章。我期待着他们未来的作品。
爬虫实战——结合多进程、线程池爬取多张图片
白话机器学习: 写的非常详细，是一篇优质博客，干货满满，让我有了全新的认识，感谢博主分享，让我学到了很多，支持支持。
Python 可迭代对象、迭代器、生成器
颗颗豌豆向太阳: from collections import Iterable # 方法一 print(isinstance([], Iterable)) # 返回 True，说明是可迭代对象 # 方法二 print(hasattr({}, '__getitem__')) # 返回 True，说明是可迭代对象 3.101版本不支持了

您愿意向朋友推荐“博客详情页”吗？

强烈不推荐
不推荐
一般般
推荐
强烈推荐

提交

最新文章

目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。