Requests库的基本使用

通过这篇文章为大家介绍崔庆才老师对Python爬虫Requests库的讲解,包括基本原理及其理论知识点

本文共有约1200字,建议阅读时间10分钟,并且注重理论与实践相结合

觉得文章比较枯燥和用电脑观看的可以点击阅读原文即可跳转到CSDN网页


目录:

一、什么是Requests库?

二、安装

三、Requests用法详解



一、什么是Requests库?

Requests库是用Python编写的,基于urllib,采用Apache2 Licensed开源协议的HTTP库;

相比urllib库,Requests库更加方便,可以节约我们大量的工作,完全满足HTTP测试需求;



二、安装

pip insatll request


三、Requests库用法详解

1.举个栗子

import requests

response = requests.get('http://www.baidu.com/')#首先调用requests的get方法

print(type(response)) <class 'requests.models.Response'> print(response.status_code) 200 print(response.text)#不需要解码就可以直接打印信息print(response.cookies) <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

2.各种请求方式(HTTP测试网站:http://httpbin.org/

requests.post('http://httpbin.org/post')

requests.put('http://httpbin.org/put')

requests.delete('http://httpbin.org/delete')

requests.head('http://httpbin.org/get')

requests.options('http://httpbin.org/get')

3.请求

  1. 基本GET请求


  2. import requests
    
    response = requests.get('http://httpbin.org/get')
    
    print(response.text)#对比urllib,无需用decode解码
  3. 带参数GET请求


  4. import requests
    
    response = requests.get('http://httpbin.org/get?name=Arise&age=22')
    
    print(response.text)
  5. import requests
    
    data = {
    'name':Arise,
    'age':22
    }
    #直接使用requests.get的params参数即可实现以上的操作

    response = requests.get('http://httpbin.org/',params = data) print(response.text)
  6. 解析Json


  7. import requests
    import json
    response = requests.get('http://httpbin.org/get')
    
    print(type(response.text))
    #可以对比一下Json转化的和直接调用response的Json方法的区别print(response.json())
    print(json(response.text))
    print(type(response.json()))
  8. 获取二进制数据(图片视频的下载)


  9. import requests
    
    response2 = requests.get('https://github.com/favicon.ico')
    
    
    print(type(response2.text),type(response2.content))
    
    
    print(response.text)
    
    print(response.contect)

    将图片抓取下来


  10. import requests
    
    response2 = requests.get('https://github.com/favicon.ico')
    
    #在文件保存目录或是Python安装目录下可以找到下载文件with open('favicon.ico','wb')as f:
        f.write(response.content)
        f.close
  11. 添加headers


  12. import requests
    #不加headers有可能会被禁,造成爬取失败
    response4 = requests.get('http://www.zhihu.com/explore')
    
    response5 = requests.get('http://www.zhihu.com/explore',headers = headers)
    
    print(response4.text)
    
    print(response5.text)
  13. 基本POST请求


  14. import requests
    #对比urllib,这里就不需要转码
    data ={'name':'Arise','age':22}

    response = requests.post('http://httpbin.org/post',data = data)
    print(response.text)


  15. import requests
    
    data ={'name':'Arise','age':22}
    
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
    
    response2 = requests.post('http://httpbin.org/post',data = data,headers = headers)
    
    
    print(response2.json())

4.响应

  1. response属性


  2. import requests
    #常用的response属性
    response3 = requests.get('http://www.jianshu.com')
    print(type(response3.status_code),response.status_code)
    print(type(response3.headers),response3.headers)
    print(type(response3.cookies),response3.cookies)
    print(type(response3.url),response3.url)
    print(type(response3.history),response3.history)
  3. 状态码判断


  4. import requests
    
    response4 = requests.get('http://www.jianshu.com')
    #响应成功的两种形式状态码,第一种是直接调用状态码200进行判断(还有其他数字的状态码)exit() if not response4.status_code ==200 else print('Requst Successfully')
    #第二种就是判断状态码的状态是否OKexit() if not response4.status_code ==requests.codes.ok else print('Requst Successfully')
  5. 举一些栗子


5.高级操作

  1. 文件上传


  2. import requests
    file = {'file':open('favicon.ico','rb')}#将之前抓取的github图标以二进制格式读取

    response = requests.post('http://httpbin.org/post',files = file)

    print(response.text)
  3. 获取Cookie


  4. import requests
    
    response5 = requests.get('http://www.baidu.com')
    #相比urllib,就不需要声明任何变量print(response5.cookies)
    
    for key,value in response5.cookies.items():
        print(key + '=' + value)
  5. 会话维持(模拟登陆)


  6. import requests
    
    requests.get('http://httpbin.org/cookies/set/number/123456789')#为网站的访问设置cookie
    
    response6 = requests.get('http://httpbin.org/cookies')#与上面的行为时独立的,所以获取不到任何与cookie相关的信息
    
    print(response6.text)


  7. import requests
    
    s = requests.Session()#通过声明Session对象,在使用这个对象发起两次GET请求(相当于同一个浏览器发出来的请求)

    s.get('http://httpbin.org/cookies/set/number/123456789')

    response = s.get('http://httpbin.org/cookies')

    print(response.text)
  8. 证书验证



  9. import requests
    #通过一下两行代码即可把警报消除,即使verify=False,报警还是存在的
    from requests.packages import urllib3
    
    urllib3.disable_warnings()
    
    #首先会检测证书是否合法,通过verify就可以设置成False就可关闭错误提示
    response = requests.get('https://www.12306.cn',verify = False)
    print(response.status_code)
  10. 代理设置

    通过谷歌浏览器设置里面可以找到相关IP(操作步骤可参考https://jingyan.baidu.com/article/d7130635f6c17213fdf475d9.html


  11. import requests
    
    proxies = {
    'http':'http://127.0.0.1:1080/pac?auth=HgT2fpms98njlh9QGpsP&t=201803030916114202',
    'https':'https://127.0.0.1:1080/pac?auth=HgT2fpms98njlh9QGpsP&t=201803030916114202',
    }
    
    response = requests.get('http://www.taobao.com',proxies = proxies)
    
    print(response.status_code)

    socks代理

  12. pip install requests[socks]


  13. import requests
    
    proxies = {
    'http':'socks5//127.0.0.1:1080/pac?auth=HgT2fpms98njlh9QGpsP&t=201803030916114202',
    'https':'socks5//127.0.0.1:1080/pac?auth=HgT2fpms98njlh9QGpsP&t=201803030916114202',
    }
    
    response = requests.get('http://www.taobao.com',proxies = proxies)
    
    print(response.status_code)
  14. 超时设置

    可以通过修改时间或是访问国外网站,会出现ReaTimeout报警

  15. import requests
    from requests.exceptions import ReadTimeout
    
    try:
        response = requests.get('http://www.taobao.com',timeout = 0.5)
        print(response.status_code)
    except ReadTimeout:
        print('TIMEOUT')
  16. 认证设置(登录验证)

    如果有类似的这种端口才可以进行操作


  17. import requests
    from requests.auth import HTTPBasicAuth
    #也可以是auth={'user','123'}以字典的形式传入
    r = requests.get('http://120.27.34.24.9001',auth = HTTPBasicAuth('user','123'))
    
    print(r.status_cde)
  18. 异常处理

    如果出现异常的话,可以到官网查询相应的异常

    http://docs.python-requests.org/en/master/api/#exceptions


  19. import requests
    #首先先捕获ReadTimeout,网络不通捕获ConnectionError,抓取异常捕获RequestException
    from requests.exceptions import ReadTimeout,ConnectionError,RequestException
    
    try:
        response1 = requests.get('http://httpbin.org/get',timeout = 0.5)
        print(response.status_code)
    except ReadTimeout:
        print('TIMEOUUT')
    except ConnectionError:
        print('HTTP Error')
    except RequestException:
        print('Error')


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值