Reptilien 02: reptilienmodell gesucht
一、urllib.requests模块
版本
- python2:urllib2、urllib
- python3:把urllib和urllib2合并
常用方法
reponse = urllib.request.urlopen("网址(url)")
:向一个网站发起一个请求并获取响应reponse.read()
:获取字节流reponse.read().decode('utf-8')
:获取字符串reponse = urllib.requests.Requests(‘网址(url)’, headers='字典')
:需要向网站发起请求,就需要headers,而urlopen()不支持重构headers
import urllib.request
headers = {
}
response = urllib.request.urlopen('https://www.baidu.com/')
# TypeError: urlopen() got an unexpected keyword argument 'headers'
# response = urllib.request.urlopen('https://www.baidu.com/', headers={})
# print(response.read())
# read()可以把对象中的内容读取出来
html = response.read().decode('utf-8')
# print(response)
# <http.client.HTTPResponse object at 0x000002952356D9D0>
print(type(html), html)
打印输出结果:
C:\python\python.exe D:/PycharmProjects/Python大神班/day-03/02-urllib.request.py
<class 'str'> <html>
<head>
<script>
location.replace(location.href.replace("https://","http://"));
</script>
</head>
<body>
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
Process finished with exit code 0
响应对象
read()
:读取服务器响应的内容getcode()
:返回HTTP的响应码geturl()
:返回实际数据的url(防止重定向问题)
import urllib.request
# urllib 发起请求思路总结
url = 'https://www.baidu.com/'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36 Edg/87.0.664.75'}
# 1、创建求请求对象 urllib.requests.Requests()--目的是构造user-agent
req = urllib.request.Request(url, headers=headers)
# 2、获取响应对象(拿到网址的response)---通过urllib.requests.urlopen()
res = urllib.request.urlopen(req)
# 3、读取响应对象中的内容---变量名.read().decode('utf-8') bytes-->str
html = res.read().decode('utf-8')
# print(html)
print(res.getcode()) # 返回状态码--200
print(res.geturl()) # 返回请求网址--https://www.baidu.com/
打印输出结果:
C:\python\python.exe D:/PycharmProjects/Python大神班/day-03/02-urllib.request.py
200
https://www.baidu.com/
Process finished with exit code 0
二、urllib.parse模块
- urllib.parse模块的作用:将请求的url中的中文转变成% + 16进制的格式
import urllib.request
import urllib.parse
url1 = 'https://www.baidu.com/s?wd=%E6%B5%B7%E8%B4%BC%E7%8E%8B'
url2 = 'https://www.baidu.com/s?wd=海贼王'
# url2---->url1
# 第一种方式:urllib.parse.urlencode('字典') --括号内传入字典
R = {
'wd': '海贼王'}
# 创建的字典就是你需要的输入的中文
# urllib.parse.urlencode(字典)--->将内部的字典中的字符串 转化成% + 16进制的形式
result1 = urllib.parse.urlencode(R)
print(result1)
url3 = 'https://www.baidu.com/s?' + result1
print(url3)
# 第二种方式:urllib.parse.quote(str)--括号内传入字符串
R = '海贼王'
result2 = urllib.parse.quote(R)
print(result2)
url4 = 'https://www.baidu.com/s?wd=' + result2
print(url4)
打印输出结果:
C:\python\pyth