Python --requests模块

qq_25500415

已于 2023-09-22 23:29:35 修改

阅读量382

点赞数

分类专栏： python 文章标签： python

于 2021-05-12 23:15:27 首次发布

本文链接：https://blog.csdn.net/qq_25500415/article/details/116723607

版权

python 专栏收录该内容

23 篇文章 0 订阅

订阅专栏

1-4-2， cookies为RequestsCookieJar对象

1-6-4， iter_content()方法

requests模块是第三方模块，需要通过pip install requests 命令进行安装

1， get方法

通过get方法， 可请求网页，返回Response对象

1-1，不带参数

通过request.get(url)，请求指定网页

In [70]: url = r'https://www.baidu.com'

# 访问指定的url网址， 返回一个Response对象
In [71]: r = requests.get(url)

In [151]: r.url
Out[151]: 'https://www.baidu.com/'

1-2，带参数(params)

params可以是字典或者元组形式

1-2-1，参数为字典

# params为字典格式
In [146]: params1 = {'key1': 'value1', 'key2': 'value2'}
In [147]: r1 = requests.get('http://httpbin.org/get', params=params1)

# r1.url返回url信息
In [148]: r1.url
Out[148]: 'http://httpbin.org/get?key1=value1&key2=value2'

# params中参数值None， 则该参数不会下发
In [184]: params1 = {'key1': 'value1', 'key2': None}
In [185]: r1 = requests.get('http://httpbin.org/get', params=params1)

# 由于key2的值为None，所以url中没有下发key2
In [186]: r1.url
Out[186]: 'http://httpbin.org/get?key1=value1'

1-2-2，参数为元组

# params为元组形式
In [153]: params1 = (('key1', 'value1'), ('key2', 'value2'))
In [154]: r1 = requests.get('http://httpbin.org/get', params=params1)

# 通过r1.url返回url信息
In [155]: r1.url
Out[155]: 'http://httpbin.org/get?key1=value1&key2=value2'

# params中参数值None， 则该参数不会下发
In [156]: params1 = (('key1', 'value1'), ('key2', None))
In [157]: r1 = requests.get('http://httpbin.org/get', params=params1)

In [158]: r1.url
Out[158]: 'http://httpbin.org/get?key1=value1'

1-3，设置headers

通过设置参数headers={'user-agent': 'xxxx'}，可在定制header进行访问，加入headers的作用是起到浏览器标识作用，若不加，若访问的网页有反爬虫，则会获取失败

# user-agent将覆盖原始的user-agent值，其他值不变
In [56]: header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chro
    ...: me/114.0.5735.289 Safari/537.36'}

In [57]: r = requests.get('https://www.baidu.com', headers=header)

user-agent的获取：F12->网络->请求标头->User-Agent

1-4，设置cookies

cookies的作用，可以访问登录后的页面， cookies的值可以是字典或者RequestsCookieJar对象

1-4-1， cookies为字典

# 设置cookies={'cookies_are': 'working'}， 访问指定url
In [178]: r = requests.get('http://httpbin.org/cookies', cookies={'cookies_are': 'working'})

1-4-2， cookies为RequestsCookieJar对象

# 生成RequestsCookieJar对象
In [179]: jar = requests.cookies.RequestsCookieJar()

# 填入RequestsCookieJar需要的信息
In [180]: jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
Out[180]: Cookie(version=0, name='tasty_cookie', value='yum', port=None, port_specified=False, domain='httpbin.org', domain_specified=True, domain_initial_dot=False, path='/cookies', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)

# 通过cookies=jar， 访问指定url
In [181]: r = requests.get('http://httpbin.org/cookies', cookies=jar)

In [182]: r.cookies
Out[182]: <RequestsCookieJar[]>

In [183]: r.json()['cookies']
Out[183]: {'tasty_cookie': 'yum'}

1-5，设置超时时间

通过timeout='xxx'，设置超时时间，单位为秒

# 2.5s后超时
In [190]: r = requests.get(r'https://www.baidu.com', timeout=2.5)

1-6，响应体(Response)

1-6-1， text属性

通过r.text，以字符串查看Response对象的内容

# 通过r.text， 以字符串返回Response的内容
In [86]: r.text
Out[86]: '<!DOCTYPE html>\r\n<!--STATUS OK--><html> ... </div> </div> </body> </html>\r\n'

1-6-2， content属性

通过r.content，以bytes查看Response对象的内容

# 通过r.content， 以bytes返回Response的内容
In [85]: r.content
Out[85]: b'<!DOCTYPE html>\r\n<!--STATUS OK--><html>...</div> </body> </html>\r\n'

1-6-3， json()方法

通过r.json()，可将json格式结果进行返回，若结果包含无效json格式，则会报错

In [18]: r = requests.get('https://api.github.com/events')
# 通过r.json()， 将json格式的结果进行返回
In [19]: r.json()
Out[19]:
[{'id': '31993150847',
  'type': 'PushEvent',
  'actor': {'id': 111329684,
....
}]

1-6-4， iter_content()方法

通过r.iter_content(chunk_size)， 可将Response对象的内容以迭代方式写入文件

# 将r的内容写入到文件中， chunk_size可自定义大小
In [31]: with open(pravate_key_file, 'wb') as f:
    ...:     for chunk in r.iter_content(chunk_size=128):
    ...:         f.write(chunk)

1-6-5， raw属性

通过r.raw，返回Response对象的原始字节流

In [33]: r.raw
Out[33]: <urllib3.response.HTTPResponse at 0x21896d1aa40>

1-6-6，响应头部(headers)

通过 r.headers，可获取响应体头部信息

In [129]: url = r'https://www.baidu.com'

In [130]: r = requests.get(url)

In [131]: r.headers
Out[131]: {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Fri, 22 Sep 2023 13:18:05 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:24:13 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}

1-6-7，获取响应体cookie

In [129]: url = r'https://www.baidu.com'

In [130]: r = requests.get(url)


In [132]: r.cookies
Out[132]: <RequestsCookieJar[Cookie(version=0, name='BDORZ', value='27315', port=None, port_specified=False, domain='.baidu.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=1695475084, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)]>

1-6-8，请求头header

In [136]: url = 'https://www.baidu.com'

In [137]: r = requests.get(url)

In [138]: r.request.headers
Out[138]: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

1-6-9， encoding属性

通过r.encoding，可查看编码方式， 通过Response对象.encoding=编码格式， 可修改编码格式

# 通过r.encoding查询Response对象的编码方式
In [88]: r.encoding
Out[88]: 'ISO-8859-1'

# 先获取Response对象， 在修改通过r.encoding = 'utf-8'修改编码方式， 再查看内容
In [94]: r = requests.get(url)
# 修改Response对象的编码方式为'utf-8'
In [95]: r.encoding = 'utf-8'
# 这里可以看出，更换编码方式后，中文字符正常显示
In [96]: r.text
Out[96]: '<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type ... 使用百度前必读</a>&nbsp; </body> </html>\r\n'

2， POST方法

通过requests.post(url， data|json)，可向表单中填写数据

2-1，参数为data

2-1-1， data为字典

通过request.post(url， data={k, v})， 将data数据写入表单

In [25]: url
Out[25]: 'http://httpbin.org/post'

# 通过requests.post，向指定的url表单中写入data数据
In [26]: r = requests.post(url, data={'k1': 'v1', 'k2': 'v2'})

# 通过json.loads， 将r.text转换为字典
In [27]: r = json.loads(r.text)
In [28]: r
Out[28]:
{'args': {},
 'data': '',
 'files': {},
 'form': {'k1': 'v1', 'k2': 'v2'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Content-Length': '11',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.31.0',
  'X-Amzn-Trace-Id': 'Root=1-6506c949-49d62f6f16049bf477544334'},
 'json': None,
 'origin': '110.184.215.54',
 'url': 'http://httpbin.org/post'}

# 查看表单填写数据
In [29]: r['form']
Out[29]: {'k1': 'v1', 'k2': 'v2'}

2-1-2， data为元组

通过request.post(url， data=((k1, v1), ..., (kn, vn)))， 将data数据写入表单

# data为元组形式，元组元素为(键， 值)
In [31]: r = requests.post(url, data=(('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')))

In [32]: r = json.loads(r.text)

In [33]: r
Out[33]:
{'args': {},
 'data': '',
 'files': {},
 'form': {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Content-Length': '17',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.31.0',
  'X-Amzn-Trace-Id': 'Root=1-6506ca81-55d3fd0d5a4636046d7e481e'},
 'json': None,
 'origin': '110.184.215.54',
 'url': 'http://httpbin.org/post'}

In [34]: r['form']
Out[34]: {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}

2-2，参数为json

payload = {'some': 'data'}
# 通过json=payload， 将payload的数据自动转换为json格式
r = requests.post('https://api.github.com/some/endpoint', json=payload)

2-3，上传文件(files)

通过设置参数files={'file': open(path, 'rb')}， 可将指定文件上传到指定网页，读写方式必须为rb

2-3-1，指定文件路径上传

通过requests.post(url, files={'file': open(path, 'rb')})， 将文件上传到指定的url

# 将path_1文件上传到指定网页
In [123]: r = requests.post(r'http://httpbin.org/post', files={'file': open(path_1, 'rb')})

In [124]: r.text
Out[124]: '{\n  "args": {}, \n  "data": "", \n  "files": {\n    "file": "111dfas\\r\\n"\n  }, \n  "form": {}, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "151", \n    "Content-Type": "multipart/form-data; boundary=500c09c42ab8abcc80b49a8888a6f6d8", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.31.0", \n    "X-Amzn-Trace-Id": "Root=1-650d57bd-0527a16423c69e35786507ac"\n  }, \n  "json": null, \n  "origin": "118.112.139.28", \n  "url": "http://httpbin.org/post"\n}\n'

2-3-2，通过文件对象上传字符串

通过requests.post(url, files={'file': (path, 自定义字符串)})， 将自定义字符串自定义字符串，通过文件文件上传到指定的url

# 通过文件对象path_1， 将字符串'aaaahbbfdsa'上传到指定网页
In [127]: r = requests.post(r'http://httpbin.org/post', files={'file': (path_1, 'aaaahbbfdsa')})

In [128]: r.text
Out[128]: '{\n  "args": {}, \n  "data": "", \n  "files": {\n    "file": "aaaahbbfdsa"\n  }, \n  "form": {}, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "184", \n    "Content-Type": "multipart/form-data; boundary=4ef067fdd0df95bd4d69cac4589fec10", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.31.0", \n    "X-Amzn-Trace-Id": "Root=1-650d59d9-45a773183cfcbcd9573aa3e9"\n  }, \n  "json": null, \n  "origin": "118.112.139.28", \n  "url": "http://httpbin.org/post"\n}\n'