爬取经典名著文学三国演义

最新推荐文章于 2025-05-11 08:42:46 发布

大繁至简937

最新推荐文章于 2025-05-11 08:42:46 发布

阅读量1.1k

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/qq_62592344/article/details/124007485

版权

爬虫专栏收录该内容

2 篇文章

订阅专栏

#导包
import requests
from bs4 import BeautifulSoup
url='http://www.gushicimingju.com/novel/sanguoyanyi/'
headers={"User-Agent":
             "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55"
         }#伪装
response=requests.get(url=url,headers=headers).text#得到了她的一个就是说来源数据
#print(response)
#创建一个txt文档用来保存爬取到的小说内容  在循环之外
with open('./doupo.txt','w',encoding='utf-8') as fp:
    #实例化一个BeautifuiSoup对象 并且获取章节目录和章节url
    #实例化B对象
    soup=BeautifulSoup(response,'lxml')
    #将使用select进行一个层次定位 获得li便签的一个列表
    page_li=soup.select('.main-content > ul > li')
    #print(page_li)
    for li in page_li:#对li进行遍历
        title=li.a.string#获得li。a里面的第一条文本 即title标题
        page_url='http://www.gushicimingju.com/'+li.a['href']#获取li。a里面的属性href 添加部分网址 获得小说内容网站
        page_data=requests.get(url=page_url,headers=headers).text#对小说内容网站进行爬取
        soup_data=BeautifulSoup(page_data,'lxml')#对该内容进行一个B对象实例化
        b_data=soup_data.find('div',class_='shici-content check-more')
        data=b_data.text
        #print(data)
        fp.write(title+':'+data+'/n')#存入
        print(title,'下载完成！！！！')