爬虫：Requests库基础知识

最新推荐文章于 2024-01-28 23:27:39 发布

爱喝胡辣汤c

最新推荐文章于 2024-01-28 23:27:39 发布

阅读量136

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/qq_44744724/article/details/108116434

版权

爬虫专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Requests库入门知识

1.Requests库的安装
2.Requests库的6个主要方法及说明
3.Requests中的Response对象
- 3.1Response对象的属性
4.Requests库的异常
5.爬虫通用代码框架
6.URL格式

1.Requests库的安装

官网地址：Requests:HTTP for Humans (http://www.python-requests.org)
Windows：win+R > cmd > pip install requests > Enter
在这里插入图片描述

2.Requests库的6个主要方法及说明

方法	说明
requests.get()	获取HTML网页
requests.head()	获取HTML网页头信息
requests.post()	向HTML网页提交POST请求
requests.put()	向HTML网页提交PUT请求
requests.patch()	向HTML网页提交局部修改请求
requests.delete()	向HTML网页提交删除请求

3.Requests中的Response对象

3.1Response对象的属性

import requests
r = requests.get("http://www.baidu.com")

属性	说明
r.status_code	HTTP请求返回的状态，200表示成功。
r.text	HTTP响应内容的字符串形式
r.encoding	从HTTP header猜测出的编码方式
r.apparent_encoding	从HTTP响应内容中分析出编码方式
r.content	HTTP响应内容的二进制形式

4.Requests库的异常

异常	说明
requests.ConnectionError	网络连接错误异常
requests.HTTPError	HTTP错误异常
requests.URLRequired	URL缺失异常
requests.TooManyRedirects	超过最大重定向次数
requests.ConnectTimeout	连接远程服务器超时
requests.Timeout	请求URL超时

response的异常

异常	说明
r.raise_for_status()	如果状态码不是200，返回requests.HTTPError

5.爬虫通用代码框架

import requests

def GetHTMLText(url):
	try:
		r = requests.get(url, timeout = 30)
		r.raise_for_status()
		r.encoding = r.apparent_encoding
		return r.text
	except:
		return "产生异常"
if __name__ == "__main__":
	url = "http://www.baidu.com"
	print(GetHTMLText(url))