python 爬虫之爬取网页并保存（简单基础知识）

最新推荐文章于 2024-05-13 13:07:07 发布

黎明之道

最新推荐文章于 2024-05-13 13:07:07 发布

阅读量1.1w

点赞数 10

分类专栏： python爬虫文章标签： python http https 爬虫转码

本文链接：https://blog.csdn.net/sjjsaaaa/article/details/111144872

版权

抓取网页效果图（代码在最后）：
在这里插入图片描述

基础知识认识

首先导入所需要的库

from fake_useragent import UserAgent#头部库
from urllib.request import Request,urlopen#请求和打开
from urllib.parse import quote#转码
from urllib.parse import urlencode#转码

先获取一个简单的网页

url = "https://www.baidu.com/?tn=02003390_43_hao_pg"  #获取一个网址
response = urlopen(url)#将网址打开
info = response.read()#读取网页内容
info.decode()#将其转码，utf-8

在这里插入图片描述
小知识

response.getcode()  #查看状态码
response.geturl()  #查看当前网址
response.info() #返回服务器想赢的HTTP报头

在这里插入图片描述

随机获取一个头部

导入专用库
from fake_useragent import UserAgent#头部库
UserAgent().random
ua.choram#这两种都可以

在这里插入图片描述

就可以随机获得一个头部。

将头部添加到headers中

首先将随机获得的头部保存在headers中
headers = {
   "User-Agent":UserAgent()

最低0.47元/天解锁文章

黎明之道

关注

10
点赞
踩
53

收藏

觉得还不错? 一键收藏
打赏
3
评论
python 爬虫之爬取网页并保存（简单基础知识）

抓取网页效果图（代码在最后）：基础知识认识首先导入所需要的库from fake_useragent import UserAgent#头部库from urllib.request import Request,urlopen#请求和打开from urllib.parse import quote#转码from urllib.parse import urlencode#转码先获取一个简单的网页url = "https://www.baidu.com/?tn=02003390_43_hao
复制链接

扫一扫