爬取文本并储存到文本中

最新推荐文章于 2024-04-27 16:08:02 发布

qq_40447533

最新推荐文章于 2024-04-27 16:08:02 发布

阅读量1.8k

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/qq_40447533/article/details/79076399

版权

python 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

爬取的网站：http://quotes.toscrape.com/

爬取内容：名人名言

下面代码

from urllib import request

导入正则表达式包

import re req=request.Request('http://quotes.toscrape.com/') req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4549.400 QQBrowser/9.7.12900.40000') reponse=request.urlopen(req).read().decode('utf-8')

使用括号只返回括号内的内容，[]表示在一个范围内，^在[]中表示不包括,+号表示至少出现一次

req=r'<span class="text" itemprop="text">“([^"]+)”</span>'
text=re.findall(req,reponse)

用open函数创建一个txt文件，用追加模式写入

for t in text : print(t) with open('3.txt','a') as f: f.write(t+'\n')

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

qq_40447533

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
爬取文本并储存到文本中

爬取的网站：http://quotes.toscrape.com/爬取内容：名人名言下面代码from urllib import request导入正则表达式包import re req=request.Request('http://quotes.toscrape.com/') req.add_header('User-Agent','Mozi...
复制链接

扫一扫