urllib库（一）

最新推荐文章于 2021-10-29 14:52:24 发布

|晴天|

最新推荐文章于 2021-10-29 14:52:24 发布

阅读量169

点赞数

分类专栏： urllib库

本文链接：https://blog.csdn.net/qq_40357974/article/details/101076226

版权

urllib库专栏收录该内容

5 篇文章 0 订阅

订阅专栏

urllib库是python3内置的HTTP请求库，
urllib库的四大模块：

urllib.request:请求模块
urllib.error:异常处理模块
urllib.parse:URL解析模块
urllib.robotparser:robots.txt解析模块

1.快速抓取网页

import urllib.request

# 发送URL其你去，获取响应
reponse = urllib.request.urlopen("http://www.baidu.com")
# 相应对象中读取数据
content = response.read().decode("UTF-8")
# 打印内容
print(content)

2.urllib方法

import urllib.request
import urllib.parse

'''
def urlopen(url, data=None, timeout=socket, _GLOBAL_DEFAULT_TIMEOUT,
					*, cafile=None, capath=None, cadefault=False, context=None)
url: 发送请求的URL
data:用于发送POST请求，如果data有参数发送的就是POST，没有发送的就是GET请求，只有http协议可以使用该参数
		1.必须是bytes的对象
		2.必须符合application/x-form-urlencoded format标准
			可以通过使用urllib.parse.urlencode()，传入参数字典实现
timeout: 超时时间，超过指定时间，没有返回响应就报错
'''
# 1.演示data参数发送POST请求
# 准备参数字典
data = {'name': 'pythonSpider'}
# 进行URL编码
data = urllin.parse.urlencode(data)
# 字符串转换成字节数据
data = bytes(data.encode())
# 发送请求，获取数据
response = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(response.read().decode())


# 2.演示超时参数
response = urllib.request.urlopen('http://httpbin.org/get', timeout=0.01)
print(response.read().decode())

3.HTTPResponse对象使用

import urllib.request

# 发送请求
response = urllib.request.urlopen('http://ww.baidu.com')
# 响应的类型
print(type(response))
# 打印属性或方法
print(dir(response))

# 获取状态码
print(response.getcode())
# 获取响应的URL
print(response.geturl())
# 获取响应元信息
print(response.info())
print(type(response.info()))

# 获取所有响应头信息
print(response.getheaders())

4.Request对象

import urllib.request

# 构造Request对象
request = urllib.request.Request('http://www.baidu.com')
# 发送请求
response = urllib.request.urlopen(request)
# 打印
print(response.read().decode())

# 发送POST请求，http://httpbin.org/post
# 1。构造Request对象
# 1.1构造数据（data参数）
data = {'name': 'pythonSpider'}
# 1.2进行URL编码
data = urllib.parse.urlencode(data)
# 1.3转换为bytes对象
data = bytes(data.encode())

# 指定headers
headers = {'User-Agent': '.......'}

# 创建Request对象
request = urllib.request.Request('http://httpbin.org/post', data=data, headers=headers)
response = urllib.request.urlopen(request)

5.URL编码转换

import urllib.parse

# 准备数据
data = {
		'host': 'ww.baidu.com',
		'name': 'baidu百度'
}

# 进行URL编码
data = urllib.parse.irlencode(data)
# URL解码，将URL编码后的字符串，转换成普通字符串
data = urllib.parse.unquote(data)
# URL编码，对字符串进行编码
data = urllib.parse.quote(data)

|晴天|

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
urllib库（一）

urllib库是python3内置的HTTP请求库，urllib库的四大模块：urllib.request:请求模块urllib.error:异常处理模块urllib.parse:URL解析模块urllib.robotparser:robots.txt解析模块1.快速抓取网页import urllib.request# 发送URL其你去，获取响应reponse = urllib...
复制链接

扫一扫

专栏目录