Python爬虫——requests库的基本使用

目录

什么是Requests——Python实现的简单易用的HTTP库

实例引入

请求

基本GET请求

基本写法

带参数的GET请求

解析json

获取二进制数据

添加headers

基本POST请求

响应

response属性

高级操作

文件上传

获取cookie

会话维持

代理设置

超时设置

异常处理


在使用urllib库的不方便的地方:加代理或处理cookie时api相对比较繁琐,发送POST请求时也比较繁琐,Requests库则相对简单。

什么是Requests——Python实现的简单易用的HTTP库

Requests是用Python语言编写,基于urllib,采用Apache2 Licensed开源协议的HTTP库。

它比urllib更加方便,可以节约大量的工作,完全满足HTTP测试需求

实例引入

import requests
response = requests.get('https://www.baidu.com')
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.text)
print(response.cookies)

能简单地得到之前urllib中的各种信息

各种的请求方式

requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')

请求

基本GET请求

基本写法

import requests
response = requests.get('http://httpbin.org/get')
print(response.text)

将请求头,IP地址,请求的链接打印出来

带参数的GET请求

在链接中

import requests
response = requests.get('http://httpbin.org/get?name=germey&age=22')
print(response.text)

在"args"中成功输出GET请求参数

 

编码添加

import requests
data = {
    'name':'germey',
    'age':22
}
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)

将字典形式的数据传入params

结果同上一代码

解析json

import requests
response = requests.get('http://httpbin.org/get')
print(type(response.text))
print(response.json())
print(type(response.json()))

运行结果:

<class 'str'>

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1'}, 'origin': '183.200.46.48', 'url': 'http://httpbin.org/get'}

<class 'dict'>

args {}

headers {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1'}

origin 183.200.46.48

url http://httpbin.org/get

response.json() = json.loads(response.text)

获取二进制数据

用于下载二进制的图片或视频等内容

import requests
response = requests.get('http://github.com/favicon.ico')
print(type(response.text),type(response.content))
print(response.text)
print(response.content)

.content为二进制编码(bytes)

下载

import requests
response = requests.get('http://github.com/favicon.ico')
with open('f.ico','wb') as f:
    f.write(response.content)

添加headers

作用:防止被禁

知乎——不加header:

import requests
response = requests.get('https://zhihu.com/explore')
print(response.text)

运行结果:出现错误

<html>

<head><title>400 Bad Request</title></head>

<body bgcolor="white">

<center><h1>400 Bad Request</h1></center>

<hr><center>openresty</center>

</body>

</html>

添加header:

import requests
headers = {
    'User-Agent':'Mozilia/5.0(Macintosh;intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/52.0.2743.116 Safari/537.36'
}
response = requests.get('https://zhihu.com/explore',headers=headers)
print(response.text)

正常输出

基本POST请求

import requests
data = {'name':'germey','age':'20'}
response = requests.post('http://httpbin.org/post',data=data)
print(response.text)

运行结果:

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "age": "20",
    "name": "germey"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "close",
    "Content-Length": "18",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.19.1"
  },
  "json": null,
  "origin": "183.200.46.48",
  "url": "http://httpbin.org/post"

响应

response属性

属性

类型

status_code

int

headers

requests.structures.CaseInsensitiveDict

cookies

requests.cookies.RequestsCookieJar

url

str

history

list

 

高级操作

文件上传

import requests
files = {'file':open('f.ico','rb')}#file可指定名称
response = requests.post('http://httpbin.org/post',files=files)
print(response.text)

获取cookie

import requests
response = requests.get('https://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
    print(key + '=' + value)

运行结果:

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315

会话维持

作用:模拟登陆

import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
response = s.get('http://httpbin.org/cookies')
print(response.text)

运行结果:

{
  "cookies": {
    "number": "123456789"
  }
}

requests.Session()相当于在一个浏览器里执行操作

代理设置

import requests
proxies = {
    'http':'http://127.0.0.2',
    'https':'https://127.0.0.2'
}
response = requests.get('http://www.baidu.com',proxies=proxies)
print(response.status_code)

超时设置

import requests
response = requests.get('http://www.taobao.com',timeout=0.1)
print(response.status_code)

异常处理:

import requests
from requests.exceptions import ReadTimeout
try:
    response = requests.get('http://www.baidu.com',timeout=0.1)
    print(response.status_code)
except ReadTimeout:
    print('TimeOut')

导入requests的异常处理库

异常处理

import requests
from requests.exceptions import ReadTimeout,ConnectionError,RequestException
try:
    response = requests.get('http://httpbin.org/get',timeout=0.5)
    print(response.status_code)
except ReadTimeout:
    print('TimeOut')
except ConnectionError:
    print("ConnectionError")
except RequestException:
    print('Error')

先捕捉子类错误再捕捉父类错误

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值