python网络爬虫urllib.request模块get请求示例

最新推荐文章于 2024-07-26 17:36:26 发布

侯小啾

最新推荐文章于 2024-07-26 17:36:26 发布

阅读量1.1k

点赞数 2

分类专栏： python网络爬虫文章标签： python 爬虫 get请求 urllib.request

本文链接：https://blog.csdn.net/weixin_48964486/article/details/122394601

版权

python网络爬虫专栏收录该内容

39 篇文章 32 订阅

订阅专栏

urllib.request使用示例

示例

需求：向向百度发请求，获取响应，得到html文件

import urllib.request
response = urllib.request.urlopen('https://www.baidu.com')  # 在urlopen()中传入url参数，以获取响应对象
print(response)
# print(type(response))  # 这个response不同于requests.get()得到的响应对象，这里是一个HTTP响应对象 <class 'http.client.HTTPResponse'>
# 用read()把响应对象的内容读取出来
# print(response.read()) # 字节流

print(response.read().decode('utf-8'))

# print(response.getcode())  # 得到响应码，200表示请求成功
# print(response.geturl())  # 返回实际数据的url(可以防止重定向问题)

输出结果：

<http.client.HTTPResponse object at 0x000002028753E588>
<html>
<head>
	<script>
		location.replace(location.href.replace("https://","http://"));
	</script>
</head>
<body>
	<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>

注：这里只演展示思路逻辑，没有构建请求对象，没有使用UA.。所以得到的源码不是真实的源码。真实的源码更加复杂，设置UA过程点击链接：urllib.request构建请求对象（写入UA）

方法解读

urllib.request 的 urlopen() 方法

用 urllib.request.urlopen() 发请求，得到的response为响应对象，urlopen()的常用参数是url。
默认是get访问，如果想要post访问需要设置参数data，格式是一个字典，这里不再详述。

response响应对象的方法

read()方法

对于得到的响应对象response，使用read()方法可以读取访问到的字节流

print(response.read())

decode()方法

对得到的字节流对象，使用decode()方法解码为字符串。常用参数为 ‘utf-8’

print(response.read().decode(‘utf-8’))

getcode()方法

得到响应码，200表示请求成功

geturl()方法

返回实际数据的url(可以防止重定向问题)

print(response.geturl())

侯小啾

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
打赏
0
评论
python网络爬虫urllib.request模块get请求示例

urllib.request使用示例示例需求：向向百度发请求，获取响应，得到html文件import urllib.requestresponse = urllib.request.urlopen('https://www.baidu.com') # 在urlopen()中传入url参数，以获取响应对象print(response)# print(type(response)) # 这个response不同于requests.get()得到的响应对象，这里是一个HTTP响应对象 <cl
复制链接

扫一扫