从web抓取信息——Requests

requests

requests模块是用于发送HTTP接口请求以及响应的Python的第三方库。

requests模块让你很容易从Web下载文件,不必担心一些复杂的问题,诸如网络错误、连接问题和数据压缩。requests模块不是Python 自带的,所以必须先安装。通过命令行,运行pip install requests。

requests库常用的7种方法:

requests.request(method, url, **kwargs)

r=requests.request(method, url, **kwargs)

**kwargs表示除了method, url外的其他所有参数

输入:

•Mothod
•GET/POST/PUT/UPDATE/DELETE/HEAD ...
•URL
•协议:http/https
•Host:ip和port
•URI:/xxx/xx
•参数:?key=value&k2=v2
•Body
•表单数据:key=value&k2=v2
•json数据:{“key”: “value”, “k2”: “v2”}
•Header
•常规的kv键值对

输出:

•状态码                               r.status_code
•响应Header                       r.headers
•响应Body                          r.json()
•文本                                  r.text
•json数据                           r.content

举例:

r = requests.request('get', 'http://10.168.54.201:8080/admin/login?username=admin&password=admin123')

r = requests.request('get', url, params ={'userame': 'admin', 'password': 'admin123'})

r = requests.request('post', url, data={“k”: “v”}, headers={“Authorization”: “xxx”})

r = requests.request('post', url, json={“k”: “v”}, headers={“Authorization”: “xxx”})

Tips:json和字典的区别:json和字典十分相像,json是字符串,字典是对象;json是字典的字符串格式,两者可以相互转换。

requests.get() #GET请求,从服务器获取数据

def get(url, params=None, **kwargs):

**kwargs表示除了url, params外的其他所有参数(其他方法类似)

requests.post() #POST请求,向服务器提交数据

def post(url, data=None, json=None, **kwargs):

requests.put() #PUT请求,从客户端向服务器传送的数据取代指定的文档的内容

def put(url, data=None, **kwargs):

requests.delete() #DELETE请求,请求服务器删除指定页面

def delete(url, **kwargs):

requests.head() #HEAD请求,请求页面头部信息

def head(url, **kwargs):

requests.patch() #PATCH请求(提交修改部分数据)

def patch(url, data=None, **kwargs):

requesets请求参数含义
url请求的网址
allow_redirects设置是否重新定向
auth设置HTTP身份验证
cert指定证书文件或密钥的字符串
cookies要发送至指定网址的Cookie字典
headers要发送到指定网址的HTTP标头字典
proxiesURL代理的协议字典
hooks钩子
stream指定响应后是否进行流式传输
timeout设置等待客户端连接的时间
verify用于验证服务器TLS证书布尔值或字符串指示
files

文件上传(只能上传文件)

对应postman中的

Content-Type:multipart/form-data

Content-Type:application/octrent-stream

data

Post或Put请求传参

对应postman中的

Content-Type:application/x-www-form-urlencoded

Content-Type:text/plain

Content-Type:application/javascript

Content-Type:text/html

Content-Type:application/xml

json

Post请求传参

对应postman中的Content-Type:application/json

paramsGet请求传参
"""get请求"""
import requests

url = 'https://tse4-mm.cn.bing.net/th/id/OIP-C.w3cHPxIHKpLZodnlBoIZXgHaMx?w=182&h=314&c=7&o=5&dpr=1.45&pid=1.7'
response = requests.get(url)
print(res.status_code)

"""添加请求头:header"""
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}
response = requests.get('https://www.zhihu.com/explore',headers=headers)
print(response.status_code)

"""带请求参数"""
params = {'wd':'python'}
response = requests.get('https://www.baidu.com/',params=params)
print(response.status_code)

"""代理设置"""
proxies = {'http':'http://127.0.0.1:9743',
          'https':'https://127.0.0.1:9742',}
response = requests.get('https://www.taobao.com',proxies=proxies)
print(rsponse.status_code)

"""SSL证书验证"""
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

"""超时设置"""
from requests.exceptions import ReadTimeout
try:
    response = requests.get("http://httpbin.org/get", timeout = 0.5)
    print(response.status_code)
except ReadTimeout:
    print('timeout')

"""认证设置"""
from requests.auth import HTTPBasicAuth
response = requests.get("http://120.27.34.24:9001/",auth=HTTPBasicAuth("user","123"))
print(response.status_code)

"""post请求"""

import requests
import json

host = 'http://httpbin.org/'
endpoint = 'post'
url = ''.join([host,endpoint])

"""带数据的post"""
data = {'key1':'value1','key2':'value2'}
response = requests.post(url,data=data)
print(response.status_code)
print(response.text)

"""带headers的post"""
headers = {'User-Agent':'test request headers'}
response = requests.post(url,headers=headers)
print(response.status_code)
print(response.text)

"""带json的post"""
data = {
    'sites':[
        {'name':'test','url':'www.test.com'},
        {'name':'google','url':'www.google.com'},
        {'name':'weibo','url':'www.weibo.com'}
    ]
}
response = requests.post(url,json=data)
print(response.status_code)
print(response.text)

"""带参数的post"""
params = {'key1':'params1','key2':'params'}
response = requests.post(url,params=params)
print(response.status_code)
print(response.text)

"""文件上传"""
files = {'file':open('fisrtgetfile.txt','rb')}
response = requests.post(url,files=files)
print(response.status_code)
print(response.text)

"""put请求"""
import requests 
import json

url = 'http://127.0.0.1:8080'
header = {'Content-Type':'application/json'}
param = {'myObjectField':'hello'}
payload = json.dumps(param)

response = requests.put(url,data=payload,headers=headers)

"""head请求"""
import requests

response = requests.head('https://pixabay.com/zh/')
print(response.status_code)

"""delete请求"""
import requests

url = 'https://api.github.com/user/emails'
email = '2475757652@qq.com'

response = requests.delete(url,json=email,auth=('username','password'))
print(response.status_code)

"""options请求"""
import requests
import json

url = 'https://www.baidu.com/s'
response = requests.options(url)
print(response.status_code)

 response

response属性功能
response.text获取文本内容
response.content获取二进制数据
response.json获取json数据
response.reason返回状态信息
response.status_code获取状态码
response.headers获取响应头
response.cookies获取cookies信息
response.cookies.get_dict以字典形式获取cookies信息
response.cookies.items以列表形式获取cookies信息
response.url获取请求的URL
response.historty获取跳转前的URL
response.encoding返回编码格式

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值