处理响应

最新推荐文章于 2024-10-01 20:12:28 发布

weixin_33827965

最新推荐文章于 2024-10-01 20:12:28 发布

阅读量151

点赞数

文章标签： json 爬虫

原文链接：https://juejin.im/post/5cbc020451882532bb2d2749

版权

响应基本API

http状态码

Response对象
>>dir(response) #查询response对象API
* statue_code   #状态码
* reason    #状态码含义
* url
* history #`Response <Response>`list
* elpased #发送request与接收response之间的时间间隔
* request
* encoding 
* raw
#: File-like object representation of response (for advanced usage).
#: Use of ``raw`` requires that ``stream=True`` be set on the request.
# This requirement does not apply for use internally to Requests.
* content
* text
* json
* response.json()['str']    #字典操作提取
复制代码

下载图片

利用爬虫自动下载图片
远程下载服务器上的文本文件

图片下载

浏览器模拟：利用Chrome开发者工具获取浏览器的User-Agent 构建request 读取流data 存入数据

def img_download():
    url = 'https://timgsa.baidu.com/timg?image&quality=80&size=b9999_' \
          '10000&sec=1555837805429&di=4daaef4aa7a422aa0da2b5c7a138c988&imgtype=0&src=' \
          'http%3A%2F%2Fcdn.ifanr.cn%2Fwp-content%2Fuploads%2F2014%2F06%2Fgithub.png'
    header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
    response = requests.get(url, headers=header, stream=True)
    print(response.status_code)
    print(response.reason)
    print(response.headers)
    with open('demo.jpg','wb') as fd:
        for chunk in response.iter_content(128):
            fd.write(chunk)

def download_image_improve():
    url = 'https://timgsa.baidu.com/timg?image&quality=80&size=b9999_' \
          '10000&sec=1555837805429&di=4daaef4aa7a422aa0da2b5c7a138c988&imgtype=0&src=' \
          'http%3A%2F%2Fcdn.ifanr.cn%2Fwp-content%2Fuploads%2F2014%2F06%2Fgithub.png'
    header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
    response = requests.get(url, headers=header, stream=True)
    from contextlib import closing
    with closing(requests.get(url, headers=header, stream=True)) as response:
        with open('demo1.jpg', 'wb') as fd:
            #每128写入一次
            for chunk in response.iter_content(128):
                fd.write(chunk)
复制代码

事件钩子（Evetnt Hooks)

*args 用来将参数打包成tuple给函数体调用 **kwargs 打包关键字参数成dict给函数体调用 注意点：参数arg、*args、**kwargs三个参数的位置必须是一定的。必须是(arg,*args,**kwargs)这个顺序，否则程序会报错。

def get_key_info(response, *args, **kwargs):
    #回调函数
    print(response.headers['Content-Type'])

def main():
    #request请求时注册回调函数
    requests.get('https://api.github.com', hooks=dict(response=get_key_info))
复制代码

转载于:https://juejin.im/post/5cbc020451882532bb2d2749