爬虫--02：爬虫请求模块

最新推荐文章于 2024-08-29 10:04:09 发布

置顶

十束多多良^_^

最新推荐文章于 2024-08-29 10:04:09 发布

阅读量620

点赞数 1

分类专栏：爬虫请求模块文章标签： python http ssl cookie

本文链接：https://blog.csdn.net/Rhymeplot__JDQS/article/details/113806703

版权

本文介绍了Python中的urllib.requests模块，包括版本、常用方法和响应对象。接着讲解了urllib.parse模块的作用，用于处理URL中的中文编码。然后详细探讨了requests模块，包括安装、常用方法、响应对象的属性以及处理POST请求、Cookie和Session。最后讨论了处理不信任的SSL证书的场景，并提供了源码分析。

摘要由CSDN通过智能技术生成

一、urllib.requests模块

版本

python2:urllib2、urllib
python3:把urllib和urllib2合并

常用方法

reponse = urllib.request.urlopen("网址（url）")：向一个网站发起一个请求并获取响应
reponse.read()：获取字节流
reponse.read().decode('utf-8'):获取字符串
reponse = urllib.requests.Requests(‘网址（url）’, headers='字典')：需要向网站发起请求，就需要headers，而urlopen()不支持重构headers

import urllib.request
headers = {
   }
response = urllib.request.urlopen('https://www.baidu.com/')
# TypeError: urlopen() got an unexpected keyword argument 'headers'
# response = urllib.request.urlopen('https://www.baidu.com/', headers={})
# print(response.read())
# read()可以把对象中的内容读取出来
html = response.read().decode('utf-8')
# print(response)
# <http.client.HTTPResponse object at 0x000002952356D9D0>
print(type(html), html)

打印输出结果：

C:\python\python.exe D:/PycharmProjects/Python大神班/day-03/02-urllib.request.py
<class 'str'> <html>
<head>
	<script>
		location.replace(location.href.replace("https://","http://"));
	</script>
</head>
<body>
	<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>

Process finished with exit code 0

响应对象

read()：读取服务器响应的内容
getcode()：返回HTTP的响应码
geturl()：返回实际数据的url（防止重定向问题）

import urllib.request


# urllib 发起请求思路总结
url = 'https://www.baidu.com/'
headers = {
   'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36 Edg/87.0.664.75'}

# 1、创建求请求对象 urllib.requests.Requests()--目的是构造user-agent
req = urllib.request.Request(url, headers=headers)
# 2、获取响应对象（拿到网址的response）---通过urllib.requests.urlopen()
res = urllib.request.urlopen(req)
# 3、读取响应对象中的内容---变量名.read().decode('utf-8')  bytes-->str
html = res.read().decode('utf-8')
# print(html)
print(res.getcode()) # 返回状态码--200
print(res.geturl()) # 返回请求网址--https://www.baidu.com/

打印输出结果：

C:\python\python.exe D:/PycharmProjects/Python大神班/day-03/02-urllib.request.py
200
https://www.baidu.com/

Process finished with exit code 0

二、urllib.parse模块

urllib.parse模块的作用：将请求的url中的中文转变成% + 16进制的格式

import urllib.request
import urllib.parse
url1 = 'https://www.baidu.com/s?wd=%E6%B5%B7%E8%B4%BC%E7%8E%8B'
url2 = 'https://www.baidu.com/s?wd=海贼王'
# url2---->url1
# 第一种方式：urllib.parse.urlencode('字典') --括号内传入字典
R = {
   'wd': '海贼王'}
# 创建的字典就是你需要的输入的中文
# urllib.parse.urlencode(字典)--->将内部的字典中的字符串 转化成% + 16进制的形式
result1 = urllib.parse.urlencode(R)
print(result1)
url3 = 'https://www.baidu.com/s?' + result1
print(url3)

# 第二种方式：urllib.parse.quote(str)--括号内传入字符串
R = '海贼王'
result2 = urllib.parse.quote(R)
print(result2)
url4 = 'https://www.baidu.com/s?wd=' + result2
print(url4)

打印输出结果：

C:\python\pyth

最低0.47元/天解锁文章

十束多多良^_^

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫--02：爬虫请求模块

Reptilien 02: reptilienmodell gesucht一、urllib.requests模块版本常用方法响应对象urllib.parse模块一、urllib.requests模块版本python2:urllib2、urllibpython3:把urllib和urllib2合并常用方法reponse = urllib.request.urlopen("网址（url）")：向一个网站发起一个请求并获取响应reponse.read()：获取字节流reponse.read
复制链接

扫一扫

专栏目录