Python爬虫学习纪要（七）：Requests 库学习笔记2

最新推荐文章于 2022-07-06 13:13:32 发布

zhaocen_1230

最新推荐文章于 2022-07-06 13:13:32 发布

阅读量348

点赞数

文章标签：爬虫 Request库

本文链接：https://blog.csdn.net/zhaocen_1230/article/details/78508402

版权

1.2、各种请求方式
#发起POST请求
requests.post('http://www.baidu.com/post')
-----------------------------------
<Response [200]>
-----------------------------------

#发起PUT请求
requests.put('http://www.baidu.com/put')
-----------------------------------
<Response [405]>
-----------------------------------
OR
requests.put('http://httpbin.org/put')
-----------------------------------
<Response [200]>
-----------------------------------
注：
405 Method Not Allowde：请求行中指定的请求方法不能被用于请求相应的资源。该响应必须返回一个Allow头信息用以表示出当前资源能够接受的请求方法的列表。

#发起DELETE请求
requests.delete('http://www.baidu.com/delete')
-----------------------------------
<Response [405]>
-----------------------------------

#发送HEAD请求
requests.head('http://www.baidu.com/get')
-----------------------------------
<Response [302]>
-----------------------------------
注：
301，302 都是HTTP状态的编码，都代表着某个URL发生了转移，不同之处在于：
301 redirect: 301 代表永久性转移(Permanently Moved)。
302 redirect: 302 代表暂时性转移(Temporarily Moved)。

#发送OPTION请求
requests.options('http://www.baidu.com/get')
-----------------------------------
<Response [200]>
-----------------------------------

2、请求
2.1、基本GET请求
2.1.1、基本写法
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)
-----------------------------------
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.17.3"
},
"origin": "117.149.2.38",
"url": "http://httpbin.org/get"
}
-----------------------------------

2.1.2、带参数的GET请求
import requests
response = requests.get('http://httpbin.org/get?name=zhaoc&age=33')
print(resopnse.text)
-----------------------------------
{
"args": {
"age": "33",
"name": "zhaoc"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.17.3"
},
"origin": "101.71.37.71",
"url": "http://httpbin.org/get?name=zhaoc&age=33"
}
-----------------------------------

2.1.3、带参数的GET请求（2）
import requests

#分装GET请求参数
param = {'name':'zhaoc', 'age':33}
#设置GET请求参数（Params）
response = requests.get('http://httpbin.org/get', params = param)
print(response.text)
-----------------------------------
{
"args": {
"age": "33",
"name": "zhaoc"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.17.3"
},
"origin": "117.149.2.38",
"url": "http://httpbin.org/get?age=33&name=zhaoc"
}
-----------------------------------

2.1.4、解析json
import requests

r = requests.get('http://httpbin.org/get')
#获取响应内容
print(type(r.text))
#如果响应内容是json，就将其转为json
print(r.json())
#输出的是字典类型
print(type(r.json())
-----------------------------------
<class 'str'>
{'url': 'http://httpbin.org/get', 'headers': {'Connection': 'close', 'Accept': '*/*', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.17.3', 'Accept-Encoding': 'gzip, deflate'}, 'args': {}, 'origin': '117.149.2.38'}
<class 'dict'>
-----------------------------------
注1：JSON（JavaScript Object Notation）：一种轻量级数据交换格式，相对于XML而言更简单，也易于阅读和编写，机器也方便解析和生成，Json是JavaScript中的一个子集。
注2：Python的Json模块序列化与反序列化的过程分别是encoding和 decoding
encoding：把一个Python对象编码转换成Json字符串
decoding：把Json格式字符串解码转换成Python对象
对于简单数据类型（string、unicode、int、float、list、tuple、dict），可以直接处理。
注3：json.dumps方法对简单数据类型encoding、json.loads方法处理简单数据类型的decoding（解码）转换。

2.1.5、获取二进制数据
import requests

r = requests.get('http://github.com/favicon.ico')

#str, bytes
print(type(r.text), type(r.content))
#输出响应的文本内容
print(r.text)
#输出响应的二进制内容
print(r.content)
#下载二进制数据到本地
with open('favion.csv', 'wb') as f:
f.write(r.content)
f.close()

-----------------------------------
<class 'str'> <class 'bytes'>
<!DOCTYPE html>
<html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always
........................
feedback>???è§????é|?</a> ?o?ICPèˉ?030173??·  <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
b'<!DOCTYPE html>\r\n<html> <head><meta http-equiv=content-type content=text/html;charset=utf-
.............................
\x81030173\xe5\x8f\xb7  <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>\r\n'
-----------------------------------

2.1.6、添加headers
import requests

#设置User-Agent浏览器信息
header = {'User-Agent':'zhaocen'}
#设置请求头信息
r = requests.get('https://www.zhihu.com/explore', headers = header)
print(r.text)

zhaocen_1230

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫学习纪要（七）：Requests 库学习笔记2

1.2、各种请求方式#发起POST请求requests.post('http://www.baidu.com/post')----------------------------------------------------------------------#发起PUT请求requests.put('http://www.baidu.com/put')---
复制链接

扫一扫