目录
requests模块是第三方模块, 需要通过pip install requests 命令进行安装
1, get方法
通过get方法, 可请求网页, 返回Response对象
1-1, 不带参数
通过request.get(url), 请求指定网页
In [70]: url = r'https://www.baidu.com'
# 访问指定的url网址, 返回一个Response对象
In [71]: r = requests.get(url)
In [151]: r.url
Out[151]: 'https://www.baidu.com/'
1-2, 带参数(params)
params可以是字典或者元组形式
1-2-1, 参数为字典
# params为字典格式
In [146]: params1 = {'key1': 'value1', 'key2': 'value2'}
In [147]: r1 = requests.get('http://httpbin.org/get', params=params1)
# r1.url返回url信息
In [148]: r1.url
Out[148]: 'http://httpbin.org/get?key1=value1&key2=value2'
# params中参数值None, 则该参数不会下发
In [184]: params1 = {'key1': 'value1', 'key2': None}
In [185]: r1 = requests.get('http://httpbin.org/get', params=params1)
# 由于key2的值为None,所以url中没有下发key2
In [186]: r1.url
Out[186]: 'http://httpbin.org/get?key1=value1'
1-2-2, 参数为元组
# params为元组形式
In [153]: params1 = (('key1', 'value1'), ('key2', 'value2'))
In [154]: r1 = requests.get('http://httpbin.org/get', params=params1)
# 通过r1.url返回url信息
In [155]: r1.url
Out[155]: 'http://httpbin.org/get?key1=value1&key2=value2'
# params中参数值None, 则该参数不会下发
In [156]: params1 = (('key1', 'value1'), ('key2', None))
In [157]: r1 = requests.get('http://httpbin.org/get', params=params1)
In [158]: r1.url
Out[158]: 'http://httpbin.org/get?key1=value1'
1-3, 设置headers
通过设置参数headers={'user-agent': 'xxxx'}, 可在定制header进行访问, 加入headers的作用是起到浏览器标识作用,若不加,若访问的网页有反爬虫,则会获取失败
# user-agent将覆盖原始的user-agent值,其他值不变
In [56]: header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chro
...: me/114.0.5735.289 Safari/537.36'}
In [57]: r = requests.get('https://www.baidu.com', headers=header)
user-agent的获取:F12->网络->请求标头->User-Agent
1-4, 设置cookies
cookies的作用,可以访问登录后的页面, cookies的值可以是字典或者RequestsCookieJar对象
1-4-1, cookies为字典
# 设置cookies={'cookies_are': 'working'}, 访问指定url
In [178]: r = requests.get('http://httpbin.org/cookies', cookies={'cookies_are': 'working'})
1-4-2, cookies为RequestsCookieJar对象
# 生成RequestsCookieJar对象
In [179]: jar = requests.cookies.RequestsCookieJar()
# 填入RequestsCookieJar需要的信息
In [180]: jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
Out[180]: Cookie(version=0, name='tasty_cookie', value='yum', port=None, port_specified=False, domain='httpbin.org', domain_specified=True, domain_initial_dot=False, path='/cookies', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)
# 通过cookies=jar, 访问指定url
In [181]: r = requests.get('http://httpbin.org/cookies', cookies=jar)
In [182]: r.cookies
Out[182]: <RequestsCookieJar[]>
In [183]: r.json()['cookies']
Out[183]: {'tasty_cookie': 'yum'}
1-5, 设置超时时间
通过timeout='xxx', 设置超时时间, 单位为秒
# 2.5s后超时
In [190]: r = requests.get(r'https://www.baidu.com', timeout=2.5)
1-6, 响应体(Response)
1-6-1, text属性
通过r.text, 以字符串查看Response对象的内容
# 通过r.text, 以字符串返回Response的内容
In [86]: r.text
Out[86]: '<!DOCTYPE html>\r\n<!--STATUS OK--><html> ... </div> </div> </body> </html>\r\n'
1-6-2, content属性
通过r.content, 以bytes查看Response对象的内容
# 通过r.content, 以bytes返回Response的内容
In [85]: r.content
Out[85]: b'<!DOCTYPE html>\r\n<!--STATUS OK--><html>...</div> </body> </html>\r\n'
1-6-3, json()方法
通过r.json(), 可将json格式结果进行返回, 若结果包含无效json格式,则会报错
In [18]: r = requests.get('https://api.github.com/events')
# 通过r.json(), 将json格式的结果进行返回
In [19]: r.json()
Out[19]:
[{'id': '31993150847',
'type': 'PushEvent',
'actor': {'id': 111329684,
....
}]
1-6-4, iter_content()方法
通过r.iter_content(chunk_size), 可将Response对象的内容以迭代方式写入文件
# 将r的内容写入到文件中, chunk_size可自定义大小
In [31]: with open(pravate_key_file, 'wb') as f:
...: for chunk in r.iter_content(chunk_size=128):
...: f.write(chunk)
1-6-5, raw属性
通过r.raw, 返回Response对象的原始字节流
In [33]: r.raw
Out[33]: <urllib3.response.HTTPResponse at 0x21896d1aa40>
1-6-6, 响应头部(headers)
通过 r.headers, 可获取响应体头部信息
In [129]: url = r'https://www.baidu.com'
In [130]: r = requests.get(url)
In [131]: r.headers
Out[131]: {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Fri, 22 Sep 2023 13:18:05 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:24:13 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}
1-6-7, 获取响应体cookie
In [129]: url = r'https://www.baidu.com'
In [130]: r = requests.get(url)
In [132]: r.cookies
Out[132]: <RequestsCookieJar[Cookie(version=0, name='BDORZ', value='27315', port=None, port_specified=False, domain='.baidu.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=1695475084, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)]>
1-6-8, 请求头header
In [136]: url = 'https://www.baidu.com'
In [137]: r = requests.get(url)
In [138]: r.request.headers
Out[138]: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
1-6-9, encoding属性
通过r.encoding, 可查看编码方式, 通过Response对象.encoding=编码格式, 可修改编码格式
# 通过r.encoding查询Response对象的编码方式
In [88]: r.encoding
Out[88]: 'ISO-8859-1'
# 先获取Response对象, 在修改通过r.encoding = 'utf-8'修改编码方式, 再查看内容
In [94]: r = requests.get(url)
# 修改Response对象的编码方式为'utf-8'
In [95]: r.encoding = 'utf-8'
# 这里可以看出,更换编码方式后,中文字符正常显示
In [96]: r.text
Out[96]: '<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type ... 使用百度前必读</a> </body> </html>\r\n'
2, POST方法
通过requests.post(url, data|json), 可向表单中填写数据
2-1, 参数为data
2-1-1, data为字典
通过request.post(url, data={k, v}), 将data数据写入表单
In [25]: url
Out[25]: 'http://httpbin.org/post'
# 通过requests.post,向指定的url表单中写入data数据
In [26]: r = requests.post(url, data={'k1': 'v1', 'k2': 'v2'})
# 通过json.loads, 将r.text转换为字典
In [27]: r = json.loads(r.text)
In [28]: r
Out[28]:
{'args': {},
'data': '',
'files': {},
'form': {'k1': 'v1', 'k2': 'v2'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Content-Length': '11',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.31.0',
'X-Amzn-Trace-Id': 'Root=1-6506c949-49d62f6f16049bf477544334'},
'json': None,
'origin': '110.184.215.54',
'url': 'http://httpbin.org/post'}
# 查看表单填写数据
In [29]: r['form']
Out[29]: {'k1': 'v1', 'k2': 'v2'}
2-1-2, data为元组
通过request.post(url, data=((k1, v1), ..., (kn, vn))), 将data数据写入表单
# data为元组形式,元组元素为(键, 值)
In [31]: r = requests.post(url, data=(('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')))
In [32]: r = json.loads(r.text)
In [33]: r
Out[33]:
{'args': {},
'data': '',
'files': {},
'form': {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Content-Length': '17',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.31.0',
'X-Amzn-Trace-Id': 'Root=1-6506ca81-55d3fd0d5a4636046d7e481e'},
'json': None,
'origin': '110.184.215.54',
'url': 'http://httpbin.org/post'}
In [34]: r['form']
Out[34]: {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
2-2, 参数为json
payload = {'some': 'data'}
# 通过json=payload, 将payload的数据自动转换为json格式
r = requests.post('https://api.github.com/some/endpoint', json=payload)
2-3, 上传文件(files)
通过设置参数files={'file': open(path, 'rb')}, 可将指定文件上传到指定网页,读写方式必须为rb
2-3-1, 指定文件路径上传
通过requests.post(url, files={'file': open(path, 'rb')}), 将文件上传到指定的url
# 将path_1文件上传到指定网页
In [123]: r = requests.post(r'http://httpbin.org/post', files={'file': open(path_1, 'rb')})
In [124]: r.text
Out[124]: '{\n "args": {}, \n "data": "", \n "files": {\n "file": "111dfas\\r\\n"\n }, \n "form": {}, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Length": "151", \n "Content-Type": "multipart/form-data; boundary=500c09c42ab8abcc80b49a8888a6f6d8", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.31.0", \n "X-Amzn-Trace-Id": "Root=1-650d57bd-0527a16423c69e35786507ac"\n }, \n "json": null, \n "origin": "118.112.139.28", \n "url": "http://httpbin.org/post"\n}\n'
2-3-2, 通过文件对象上传字符串
通过requests.post(url, files={'file': (path, 自定义字符串)}), 将自定义字符串自定义字符串, 通过文件文件上传到指定的url
# 通过文件对象path_1, 将字符串'aaaahbbfdsa'上传到指定网页
In [127]: r = requests.post(r'http://httpbin.org/post', files={'file': (path_1, 'aaaahbbfdsa')})
In [128]: r.text
Out[128]: '{\n "args": {}, \n "data": "", \n "files": {\n "file": "aaaahbbfdsa"\n }, \n "form": {}, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Length": "184", \n "Content-Type": "multipart/form-data; boundary=4ef067fdd0df95bd4d69cac4589fec10", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.31.0", \n "X-Amzn-Trace-Id": "Root=1-650d59d9-45a773183cfcbcd9573aa3e9"\n }, \n "json": null, \n "origin": "118.112.139.28", \n "url": "http://httpbin.org/post"\n}\n'