Python 爬虫

最新推荐文章于 2021-01-07 17:14:58 发布

上山看海

最新推荐文章于 2021-01-07 17:14:58 发布

阅读量171

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/f826241061/article/details/104114012

版权

Python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

import requests
#正则模块
import re


# 要爬的网站
url = 'http://'


# 模拟浏览器发送http请求
response = requests.get(url)
# 编码方式
response.encoding = 'utf-8'
# 目标小说主页的网页源码
html = response.text
# 小说名
title = re.findall(r'<meta property="og:novel:book_name" content="(.*?)" />',html,re.S)[0]
# 新建一个文件，以小说名命名
fb = open('%s.txt' % title, 'w', encoding='utf-8')
# 获取每一章的信息（章节，url）
# re.S .匹配任意字符包括不可见字符（空格回车）
chapter_info_list = re.findall(r'<a rel="nofollow" href="(.*?)">(.*?)</a>', html, re.S) # 第二个参数为string，使用str()将列表转换为string
#循环每一个章节,分别下载
for chapter_info in chapter_info_list:
    chapter_title = chapter_info[1]
    chapter_url = chapter_info[0]
    chapter_info = chapter_info_list[0]
    chapter_url, chapter_title = chapter_info # 与注释掉的两句同义
    chapter_response = requests.get('http://www.xqiushu.com/t/17591/5788711.html')
    chapter_response.encoding = 'utf-8'
    chapter_html = chapter_response.text
    chapter_content = re.findall(r'<div class="book_content" id="content">(.*?)<div class="con_l">', html, re.S)
    fb.write(chapter_title)
    fb.write(str(chapter_content))#使用str()将列表转换为string
    fb.write("\n")

上山看海

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python 爬虫

import requests#正则模块import re# 要爬的网站url = 'http://'# 模拟浏览器发送http请求response = requests.get(url)# 编码方式response.encoding = 'utf-8'# 目标小说主页的网页源码html = response.text# 小说名title = re.findall...
复制链接

扫一扫

专栏目录