爬爬爬,爬虫之获取数据——requests

推荐使用requests库,相比urllib使用要简介的多

requests向目标站点发送请求,获得一个HTTPresponse响应

import requests

requests.get('http://httpbin.org/get')
requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')

http://httpbin.org 网址是一个测试网址

看看response里都有些啥

import requests

response = requests.get('http://www.baidu.com')
print(response.status_code)  # 打印状态码
print(response.url)          # 打印请求url
print(response.headers)      # 打印头信息
print(response.cookies)      # 打印cookie信息
print(response.text)         # 以文本形式打印网页源码
print(response.content)      # 以字节流形式打印,用于下载文件

get请求携带参数,有两种方法

    1.将数据放在请求地址

    2,将数据放入字典,作为get方法的参数

import requests

response = requests.get('http://httpbin.org/get?name=gemey&age=22')
print(response.text)

'''
响应内容
{
  "args": {
    "age": "22", 
    "name": "gemey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.19.1"
  }, 
  "origin": "117.136.46.209", 
  "url": "http://httpbin.org/get?name=gemey&age=22"
}

'''
import requests
data = {
    'name':'lu',
    'age':18
}
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)

'''
响应内容
{
  "args": {
    "age": "18", 
    "name": "lu"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.19.1"
  }, 
  "origin": "117.136.46.209", 
  "url": "http://httpbin.org/get?name=lu&age=18"
}
'''

json方法

import requests

response = requests.get('http://httpbin.org/get')
print(response.text)
print(response.json())  # response.json()方法同json.loads(response.text)
print(type(response.json()))

添加请求头

获取请求头:

添加:

import requests
headers = {
    'User-Agent':'网页上的User-Agent,粘过来'
}
response = requests.get('http://httpbin.org/get',headers=headers)

异常处理

import requests
from requests.exceptions import ReadTimeout,HTTPError,RequestException

try:
    response = requests.get('http://www.baidu.com',timeout=0.5)
    print(response.status_code)
except ReadTimeout:
    print('timeout')
except HTTPError:
    print('httperror')
except RequestException:
    print('reqerror')

post方法:

在通过requests.post()进行POST请求时,传入报文的参数有两个,一个是data,一个是json。常见的form表单可以直接使用data参数进行报文提交,而data的对象则是python中的字典类型

# post 语法
post(url, data=None, json=None, **kwargs)


def post_name_money(number, name, money):
    url = "http://newcredit.ezendai.com/credit-admin/offer/offerInfo/insert?r=1556086333984"
    header = {
        "User-Agent": "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36",
        "Cookie": "{0}".format(cookies)}
    data = {
        'loanId':'',
        'isShowPayChannel':'true',
        'borrowName':'{0}'.format(name),
        'contractNum':'{0}'.format(number),
    }
    html = requests.post(url, data, headers=header).html

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值