python爬虫入门实战--爬取小说

最新推荐文章于 2024-06-24 18:45:00 发布

九黎AJ

最新推荐文章于 2024-06-24 18:45:00 发布

阅读量369

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/qq_30931547/article/details/118016499

版权

python 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

python爬虫入门实战–爬取小说

**@更多基础加交流3群698307198喽;
一键加群:点击加群
和更多作者同群交流

先看效果.
在这里插入图片描述
环境:python3.9.5
使用调试工具:Visual Studio Code
pip版本:21.2.2

如果提示pip需要升级
可以参考
点击这里

实现代码

import os
import requests
import bs4
from bs4 import BeautifulSoup
#记得先安装 pip install requests
#pip install BeautifulSoup4
# 声明请求头
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'
}
 
# 创建保存玄幻小说文本的文件夹.如果目录不存在.(加交流3群698307198)
if not os.path.exists('./玄幻'):
    os.mkdir('./玄幻/')
     
# 访问网站并获取页面数据.
#response = requests.get(address).text
#print(response)这样会乱码
address="http://www.biquw.com/book/951/"

#### 我们看到用上面的代码乱码了.所以需要重新编写访问代码
response = requests.get(address)
#http://www.biquw.com/book/17991/
response.encoding = response.apparent_encoding
print(response.text)
 
'''
这种方式返回的中文数据
'''
'''
f12大法看网页源码可以知道.数据是保存在a标签当中的。
a的父标签为li，li的父标签为ul标签，ul标签之上为div标签。
所以如果想要获取整个页面的玄幻章节数据，那么需要先获取div标签。
并且div标签中包含了class属性，我们可以通过class属性获取指定的div标签，详情看代码~
'''
soup = BeautifulSoup(response.text, 'html.parser')
book_list = soup.find('div', class_='book_list').find_all('a')
# soup对象获取批量数据后返回的是一个列表，我们可以对列表进行迭代提取
for book in book_list:
    book_name = book.text
    # 获取到列表数据之后，需要获取文章详情页的链接，链接在a标签的href属性中
    book_url = book['href']


    book_info_html = requests.get(address + book_url, headers=headers)
    book_info_html.encoding = book_info_html.apparent_encoding

    soup = BeautifulSoup(book_info_html.text, 'html.parser')
    info = soup.find('div', id='htmlContent')
    print(info.text)

    with open('./玄幻/' + book_name + '.txt', 'a', encoding='utf-8') as f:
      f.write(info.text)

九黎AJ

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫入门实战--爬取小说

python爬虫入门实战–爬取小说**@更多基础加autojs交流3群698307198喽;一键加群:点击加群和更多作者同群交流先看效果.环境:python3.9.5使用调试工具:Visual Studio Codepip版本:21.2.2如果提示pip需要升级可以参考点击这里实现代码import osimport requestsimport bs4from bs4 import BeautifulSoup#p200# pip install requests.
复制链接

扫一扫

专栏目录