第一个爬虫程序，我与爬虫不得不写的博客

最新推荐文章于 2024-05-07 15:53:56 发布

weixin_42562146

最新推荐文章于 2024-05-07 15:53:56 发布

阅读量889

点赞数 1

文章标签： scrapy re 正则表达式爬虫 findall

本文链接：https://blog.csdn.net/weixin_42562146/article/details/80960950

版权

目标：成功爬取一个小说网站的某个小说所有内容：

工具：Python3.5，pycharm

历时：12小时（很多时间都在纠结）

结果：当然是成功了

# -*- coding: utf-8 -*-
import requests
import re
import string
#下载一个网页
url = 'http://www.jingcaiyuedu.com/book/15401/list.html'
#模拟浏览器发送http请求,通过requests发送url get请求，服务器response
# 返回响应、 数据等
response = requests.get(url)
#规定网页编码方式
response.encoding = 'utf-8'
#目标小说主页源代码
html = response.text
#小说名字
#
title = re.findall(r'<title>(.*?)</title>', html)
#新建一个文件，保存小说内容

最低0.47元/天解锁文章

weixin_42562146

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
2
评论
第一个爬虫程序，我与爬虫不得不写的博客

目标：成功爬取一个小说网站的某个小说所有内容：工具：Python3.5，pycharm历时：12小时（很多时间都在纠结）结果：当然是成功了# -*- coding: utf-8 -*-import requestsimport reimport string#下载一个网页url = 'http://www.jingcaiyuedu.com/book/15401/list.html'#模...
复制链接

扫一扫