最近跟小说杠上了,都是小说的内容。
今天用Python来实现一下那个叫西红柿小说的下载保存
需要准备的
环境使用
- Python 3.8
- Pycharm 2023
模块使用
- requests
- re
- parsel
requests是第三方模块,直接win + R 输入cmd ,然后输入命令 pip install requests 安装,其它两个是内置模块,不需要安装。
没有软件和pycharm永久ji huo码的话,文末名片自取~
源码
import requests
import re
import parsel
from prettytable import PrettyTable
from tqdm import tqdm
while True:
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
}
key = input('请输入你要下载的小说: 输入00退出 ')
if key == '00':
break
tb = PrettyTable()
tb.field_names = ['序号', '书名', '作者', '类型', '最新章节', 'ID']
num = 0
info = []
print('正在检索中, 请稍后.....')
for page in tqdm(range(30)):
search_url = 'https://大家自己替换一下地址.com/api/author/search/search_book/v1'
search_params = {
'filter': '127,127,127,127',
'page_count': '10',
'page_index': page,
'query_type': '0',
'query_word': key,
}
search_data = requests.get(url=search_url, params=search_params, headers=headers).json()
for i in search_data['data']['search_book_data_list']:
book_name = i['book_name']
author = i['author']
book_id = i['book_id']
category = i['category']
last_chapter_title = i['last_chapter_title']
dit = {
'book_name': book_name,
'author': author,
'category': category,
'last_chapter_title': last_chapter_title,
'book_id': book_id,
}
info.append(dit)
tb.add_row([num, book_name, author, category, last_chapter_title, book_id])
num += 1
print(tb)
book = input('请输入你要下载小说序号: ')
url = f'https://大家自己替换一下.com/page/{info[int(book)]["book_id"]}'
response = requests.get(url=url, headers=headers)
html_data = response.text
name = re.findall('<div class="info-name"><h1>(.*?)</h1', html_data)[0]
selector = parsel.Selector(html_data)
css_name = selector.css('.info-name h1::text').get()
href = selector.css('.chapter-item a::attr(href)').getall()
print(f'{name}, 小说正在下载, 请稍后....')
for index in tqdm(href):
chapter_id = index.split('/')[-1]
link = f'https://替换掉了.com/api/novel/book/reader/full/v1/?device_platform=android&parent_enterfrom=novel_channel_search.tab.&aid=2329&platform_id=1&group_id={chapter_id}&item_id={chapter_id}'
json_data = requests.get(url=link, headers=headers).json()['data']['content']
title = re.findall('<div class="tt-title">(.*?)</div>', json_data)[0]
content = '\n'.join(re.findall('<p>(.*?)</p>', json_data))
with open(f'{name}.txt', mode='a', encoding='utf-8') as f:
f.write(title)
f.write('\n')
f.write(content)
f.write('\n')
效果
搜索下载
还是非常简单的,完整代码以及视频讲解我都打包好了,文末名片自取。
好了,,本次分享到这里就结束了,下次见~