图片抓取失败

今天发现一个错误日志:

 

2013-06-06 12:25:13,332 [ERROR]  upload.service.UploadFileService -  image  open error ,url = http://img.xitisi.com/Commodity/BOBOTou_2204/RiXiFaXingNvShengJiaFa_HuaBuWu2011XinKuan_QiLiuHaiBoboBoBoTouXiuLianDuanFaZongSe20120210034904.jpg ,cannot identify image fil

 

看了一下图片的头信息:

Accept-Rangesbytes
Content-Encodinggzip
Content-Length452449
Content-Typeimage/jpeg
DateThu, 06 Jun 2013 05:03:08 GMT
Etag"8041952b9a50cd1:1a9a"
Last-ModifiedFri, 22 Jun 2012 17:12:15 GMT
ServerMicrosoft-IIS/6.0
VaryAccept-Encoding
X-Powered-ByASP.NET

 

原来是通过gzip压缩过,所以Image无法识别,需要先处理一下。

解决办法:

1. 通过python的gzip反解

    def _read_content(self,response):
        content_type = response.headers.get('Content-Type')
        content_encoding = response.headers.get("Content-Encoding")
        if response.code == 200 and content_type and content_type.find('image')!=-1:
            data = StringIO(response.read())
            if content_encoding=="gzip":
                data = gzip.GzipFile(fileobj=data).read()
                data = StringIO((data))
            return data
        else:
            logger.error("can't open image ,content type=%s, url=%s"%(content_type,url))
            return None 

 

 2. 在请求头中指定不支持gzip

    self.headers = {}
            self.headers['User-Agent'] = """Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB6"""
            self.headers['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
            self.headers['Accept-Encoding'] = 'identity'
            self.headers['Accept-Language'] = "zh,en-us;q=0.7,en;q=0.3"
            self.headers['Accept-Charset'] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
            self.headers['Connection'] = "keep-alive"
            self.headers['Keep-Alive'] = "115"
            self.headers['Cache-Control'] = "no-cache"

    def open(self, url):
        try:
            response = self.opener.open(urllib2.Request(url, headers=self.headers),timeout=self.timeout)
            data =  self._read_content(response)
            return data
        except Exception,e:
            logger.error(url)
            logger.exception(e)
            return None    

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值