初学者爬虫基本步骤

最新推荐文章于 2023-03-21 11:40:24 发布

Ting_0517

最新推荐文章于 2023-03-21 11:40:24 发布

阅读量381

点赞数

本文链接：https://blog.csdn.net/Ting_0517/article/details/104803229

版权

爬虫基本步骤：1.发送请求，通过urlopen（URL），获得response2.通过reponse.read()获得字节，如果得到像图片等二进制的字节就不需要转化了，否则通过decode()转化成字符串3.使用正则表达式或Bs进行信息（字符模式）的提取4.对爬取的数据进行存储案例：名言网爬虫from urllib.request import urlopenimport reim...

摘要由CSDN通过智能技术生成

爬虫基本步骤：
1.发送请求，通过urlopen（URL），获得response
2.通过reponse.read()获得字节，如果得到像图片等二进制的字节就不需要转化了，否则通过decode()转化成字符串
3.使用正则表达式或Bs进行信息（字符模式）的提取
4.对爬取的数据进行存储
案例：名言网爬虫
from urllib.request import urlopen
import re
import csv
第一步：
url1=“http://quotes.toscrape.com/”
response=urlopen(url1)

第二步：
html_text=response.read().decode()

print(html_text)

第三步：
res_div="<div class=“quote”(.?)"
res_quote="<span class=“text” itemprop=“text”>(.?)"
res_author=“by <small class=“author” itemprop=“author”>(.?)"
res_tag="<a class=“tag” .?>(.*?)”
li=re.findall(res_div,html_text,re.S|re.M|re.I)

第四步：
with open(“c:/aa.csv”,“wt”,newline=&#

最低0.47元/天解锁文章

Ting_0517

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
初学者爬虫基本步骤

爬虫基本步骤：1.发送请求，通过urlopen（URL），获得response2.通过reponse.read()获得字节，如果得到像图片等二进制的字节就不需要转化了，否则通过decode()转化成字符串3.使用正则表达式或Bs进行信息（字符模式）的提取4.对爬取的数据进行存储案例：名言网爬虫from urllib.request import urlopenimport reim...
复制链接

扫一扫