使用python、requests及PIL获取指定连接的图片

对于利用python的相关库来获取指定连接的图片,下面的这位网友总结了三种方法:

https://blog.csdn.net/qq_34504481/article/details/79716106

具体细节可以参考上述网页,在这三种方法中作者使用了不同的库,urllib.request和requests,有些疑惑,于是又查找了这两者不同,总而言之就是requests对urllib.request在此封装,比较高级,使用requests就行了,可以参考这篇:

https://blog.csdn.net/qq_38783948/article/details/88239109

于是我去了requests官网,查找其使用说明及示例,官网:

https://2.python-requests.org/en/master/

网页开头第一句话:Requests: HTTP for Humans,真滴就暖心,开发一个库就是方便人类使用,而不是让人痛苦!

进入Quickstart中的Make a Request,开始学习一下

https://2.python-requests.org//en/master/user/quickstart/#make-a-request

里面的内容比较多,可以选着学吧。

下面直接code吧,比较直接:

1、参考https://blog.csdn.net/qq_34504481/article/details/79716106的方法以及加了一些错误处理机制,形成了下面的初始代码:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#2019/09/10 by DQ

import os
import requests
from urllib.request import urlretrieve


ImUrlTxtPath='ImUrls.txt'
NotLoadImUrlTxtPath='NotLoadImUrls.txt'
LogTxtPath='Log.txt'
ImSetDir=os.path.join(os.getcwd(),'ImSet')
if not os.path.isdir(ImSetDir):
    os.makedirs(ImSetDir)

with open(ImUrlTxtPath,'r') as FId:
    ImUrls=FId.readlines()
with open(NotLoadImUrlTxtPath,'w') as FId,open(LogTxtPath,'w') as FId2:
    for ImUrl in ImUrls:
        _, ImName = os.path.split(ImUrl.strip())
        SaveImPath = os.path.join(ImSetDir, ImName)

        try:
            urlretrieve(ImUrl, SaveImPath)
            MsgTxt='download {} with urlretrieve\n'.format(ImName)
            print(MsgTxt)
            FId2.writelines(MsgTxt)
        except:
            try:
                StatCode = requests.head(ImUrl).status_code
                if StatCode == 404:
                    FId.writelines(ImUrl)
                    MsgTxt = 'not download {} 404 error\n'.format(ImName)
                    print(MsgTxt)
                    FId2.writelines(MsgTxt)
                    continue

                r = requests.get(ImUrl.strip(),timeout=30)#设个请求时长限制,不然慢的要死
                with open(SaveImPath, 'wb') as f:
                    f.write(r.content)
                MsgTxt='download {} with requests.get()\n'.format(ImName)
                print(MsgTxt)
                FId2.writelines(MsgTxt)
            except:
                try:
                    r = requests.get(ImUrl.strip(), stream=True)
                    with open(SaveImPath, 'wb') as f:
                        for chunk in r.iter_content(chunk_size=32):
                            f.write(chunk)
                    MsgTxt='download {} with requests.get(stream)\n'.format(ImName)
                    print(MsgTxt)
                    FId2.writelines(MsgTxt)
                except:
                    FId.writelines(ImUrl)
                    MsgTxt ='not download \n'.format(ImName)
                    print(MsgTxt)
                    FId2.writelines(MsgTxt)

在此ImUrls.txt的内容如下:

http://www.openrussia.ru/imgs/products/logos/31342.jpg
http://www.forestnet.com/timberwest/archives/Nov_Dec_02/pics/nov_dec_2002_11_0002.jpg
http://www.dfzthai.com/images/pic_sales.jpg
http://grid-iron.net/images/equipment/4sale/2003_jcbjs260_01/1.jpg
http://www.nfldkubota.com/Pictures/KX91.gif
http://en.hongwing.com/UserFiles/image/11194744937.jpg
http://www.anguilla-beaches.com/image-files/tropical-building-excavator-1.jpg
http://www.openrussia.ru/imgs/products/logos/31465.jpg
http://s19.photobucket.com/albums/b189/Balacai/th_JCB806excavator.jpg
http://www.whenrecyclingexpo.com/images/100_1057.jpg

按着上面程序确实,程序能把图片加载下来,但有的图片下载下来却不能显示:

这些图片是打不开的,主要原因没有获得连接上的图片,具体原因很多了。

而且后面还发现了403等错误,一堆问题,在此插一句,请求时或遇到403或404错误,在此有个网页说的还可以,参考:

https://zhidao.baidu.com/question/561808693.html

并且在https://2.python-requests.org//en/master/user/quickstart/#make-a-request中的看到了这几句话,这是在是很高心,总算解决了心头的问题:

Binary Response Content
You can also access the response body as bytes, for non-text requests:

>>> r.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...
The gzip and deflate transfer-encodings are automatically decoded for you.

For example, to create an image from binary data returned by a request, you can use the following code:

>>> from PIL import Image
>>> from io import BytesIO

>>> i = Image.open(BytesIO(r.content))

图片不就是非文本内容吗,这正是我想要的结果。综合上述形成了下面的代码:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#2019/09/11 by DQ

import os
import requests
from PIL import Image
from io import BytesIO

ImUrlTxtPath='ImUrls.txt'
NotLoadImUrlTxtPath='NotLoadImUrls.txt'
LogTxtPath='Log.txt'
ImSetDir=os.path.join(os.getcwd(),'ImSet')
if not os.path.isdir(ImSetDir):
    os.makedirs(ImSetDir)

with open(ImUrlTxtPath,'r') as FId:
    ImUrls=FId.readlines()
with open(NotLoadImUrlTxtPath,'w') as FId,open(LogTxtPath,'w') as FId2:
    for ImUrl in ImUrls[:10]:
        _, ImName = os.path.split(ImUrl.strip())
        SaveImPath = os.path.join(ImSetDir, ImName)

        try:
            r = requests.get(ImUrl.strip())
        except:#ConnectionRefusedError
            FId.writelines(ImUrl)
            MsgTxt='not download {} Connection refused\n'.format(ImName)
            print(MsgTxt)
            FId2.writelines(MsgTxt)
            continue

        StatCode = r.status_code
        if StatCode == 403 or StatCode == 404:
            FId.writelines(ImUrl)
            MsgTxt = 'not download {} {} error\n'.format(ImName,StatCode)
            print(MsgTxt)
            FId2.writelines(MsgTxt)
            continue

        try:
            Im=Image.open(BytesIO(r.content))
            Im.save(SaveImPath)
            MsgTxt='download {} with requests.get()\n'.format(ImName)
            print(MsgTxt)
            FId2.writelines(MsgTxt)
        except:
            FId.writelines(ImUrl)
            MsgTxt ='not download unknown error\n'.format(ImName)
            print(MsgTxt)
            FId2.writelines(MsgTxt)

代码也不做注解了,比较明了,这样下载的链接的图片就没问题了,链接失效或请求不到或不是图片的问题都过滤了,因此不会出现上述下载了打不开的图片。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值