爬虫实战之五--requests详解

实例

import requests
response=requests.get('http://www.baidu.com/')
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.text)
print(response.cookies)

各种请求方式

import requests
requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')

最简单的get请求:

import requests
response=requests.get('http://httpbin.org/get')
print(response.text)

带参数的get请求:

import requests
response=requests.get('http://httpbin.org/get?name=germey&age=22')
print(response.text)

以字典的方式传递参数:

import requests
data={
    'name':'germey',
    'age':22
}
response=requests.get('http://httpbin.org/get',params=data)
print(response.text)

解析json

import requests

response=requests.get('http://httpbin.org/get')
print(type(response.text))
print(response.json())
print(type(response.json()))

输出:<class 'str'>

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '222.54.10.136', 'url': 'http://httpbin.org/get'}

<class 'dict'>

获取二进制数据,一般在下载视频或者图片时会用到

import requests

response=requests.get('https://github.com/favicon.ico')#是一个图片的链接
print(type(response.text),type(response.content))
print(response.text)

print(response.content)

保存图片到本地:

import requests

response=requests.get('https://github.com/favicon.ico')#是一个图片的链接
with open('favicon.ico','wb') as f:
    f.write(response.content)
    f.close()

添加headers,如果不添加的话,会在访问某些网站时被拒

import requests

response=requests.get('https://www.zhihu.com/explore')
print(response.text)

输出:

<html><body><h1>500 Server Error</h1>

An internal server error occured.

</body></html>

添加了headers再访问,就能得到网页:

import requests
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'

}
response=requests.get('https://www.zhihu.com/explore',headers=headers)
print(response.text)

基本post请求:

import requests
data={'name':'germey','age':'22'}
response=requests.post('https://httpbin.org/post',data=data)
print(response.text)

 

 

import requests
data={'name':'germey','age':'22'}
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'

}
response=requests.post('https://httpbin.org/post',data=data,headers=headers)
print(response.text)

高级操作

文件上传:

import requests
files={'file':open('favicon.ico','rb')}
response=requests.post('https://httpbin.org/post',files=files)
print(response.text)

 

获取cookie

import requests
response=requests.get("https://www.baidu.com")
print(response.cookies)
for key,value in response.cookies.items():
    print(key+'='+value)

获取了cookie之后就能够维持回话,可以来模拟登陆

会话维持:

import requests
response=requests.get('http://httpbin.org/cookies/set/number/123456789')#设置浏览器cookie
response=requests.get('http://httpbin.org/cookies')#获取浏览器cookie
print(response.text)

输出结果:

{

  "cookies": {}

}

发现输出cookie是空,这是因为调用了两次requests.get方法,相当于在一个浏览器中设置cookie,又在另一个浏览器中获取cookie,两者是互不关联的cookie,所以是空。

requests库使用一个session方法来模拟在浏览器发起的每一次请求,相当你在一个浏览器中进行操作:

import requests
s=requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
response=s.get('http://httpbin.org/cookies')
print(response.text)

输出结果:

{

  "cookies": {

    "number": "123456789"

  }

}

证书验证:

import requests
response=requests.get('https://www.12306.cn')
print(response.status_code)

输出结果会是SSLError的错误,因为12306网站的证书是不安全的,所以在连接时会自动报错。也可以取消这一项验证:

import requests
response=requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

代理设置:

import requests
proxies={
    "http":"http://127.0.0.1:9743",
    "https":"https://127.0.0.1:9743"
}
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'

}
response=requests.get('http://www.taobao.com',proxies=proxies,headers=headers)
print(response.status_code)

(未完)

 

 

 

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值