python爬取小说写入txt_Python爬取网络小说,并写入txt文件

以爬取起点中文网免费页面为例讲解(以主神黑店为例讲解)

获取要爬取小说的第一章节的urlurl = https://read.qidian.com/chapter/CTxPsgzdPBfu4xLcYRGW6w2/7t9v4ciILvngn4SMoDUcDQ2

进行伪装,防止服务器不反回数据

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0'}

下面进行第一张爬取

38c959da37a5bf4fb203694662a09235.jpg

从这里得到章节名

9de5795e20ce6dd69852903bca85f037.jpg

从这里获取内容

代码如下

import requests

from bs4 import Beautifulsoup

url = ' https://read.qidian.com/chapter/CTxPsgzdPBfu4xLcYRGW6w2/7t9v4ciILvngn4SMoDUcDQ2'headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0'}response = requests.get(url = url , headers = headers)html = response.text

soup = Beautifulsoup(html,'lxml')

novel_name = soup.find('h3',{'class':'j_chapterNmae'}).getText()

novel_txt = soup.find('div',{'class':'read-content j_readContent'})

novel = novel_name + '\n' +novel_txt

save_path = 'C:\\Users\\23219\\Desktop\\pycharm_project\\小说\\主神黑店'

save_name = '\\主神黑店' + '.txt'

full_path = save_path + save_name

fp = open(full_path,'w')

fp.write(novel)

fp.close()

这样第一章就被爬去下来

用while循环进行整本小说的爬取

1098306fbf88ef194c93a90c7b64e2ef.jpg这里获取下一章的url

代码如下next_url = soup.find('a',{'id','j_chapterNext'})

url = 'https:' + next_url['href']

这样就获得了下一章的url

使用while循环进行批量爬取小说。

代码整理如下import requests

from bs4 import BeautifulSoup

#定义请求函数

def get_html(url,headers):

response = requests.get(url , headers)

return response.text

#定义解析函数

def get_novel(html):

soup = BeautifulSoup( html , 'lxml' )

novel_name = soup.find( 'h3' , { 'class' : 'j_chapterName' } ).getText()

novel_txt = soup.find( 'div' , { 'class' : 'read-content j_readContent' }).getText().replace("  ","\n")

return novel_name + '\n' + novel_txt

#定义获取下一章节的url

def get_next_url(html):

soup = BeautifulSoup( html , 'lxml' )

next_url = soup.find( 'a' , { 'id' : 'j_chapterNext'})

return next_url['href']

def save(novel , i):

save_path = 'C:\\Users\\23219\\Desktop\\pycharm_project\\小说\\主神黑店'

save_name = '\\主神黑店' + str(i) + '.doc'

full_path = save_path + save_name

fp = open(full_path , 'w')

fp.write(novel)

fp.close()

#定义主函数

def main():

url = 'https://read.qidian.com/chapter/CTxPsgzdPBfu4xLcYRGW6w2/7t9v4ciILvngn4SMoDUcDQ2'

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0'}

i = 1

while i:

i += 1

#调用请求函数

html = get_html(url = url , headers = headers)

#调用获取下一章url

url = 'https:' + get_next_url(html = html)

#调用解析函数

novel = get_novel(html=html)

#调用存储函数

save(novel = novel, i = i)

main()

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值