python requests库 response_python Requests库在处理response时的一些陷阱

python的Requests(http://docs.python-requests.org/en/latest/)库在处理http/https请求时还是比较方便的,应用也比较广泛。

但其在处理response时有一些地方需要特别注意,简单来说就是Response对象的content方法和text方法的区别,具体代码如下:

@propertydefcontent(self):"""Content of the response, in bytes."""

if self._content isFalse:#Read the contents.

try:ifself._content_consumed:raiseRuntimeError('The content for this response was already consumed')if self.status_code ==0:

self._content=Noneelse:

self._content= bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) orbytes()exceptAttributeError:

self._content=None

self._content_consumed=True#don't need to release the connection; that's been handled by urllib3

#since we exhausted the data.

returnself._content

@propertydeftext(self):"""Content of the response, in unicode.

if Response.encoding is None and chardet module is available, encoding

will be guessed."""

#Try charset from content-type

content =None

encoding=self.encodingif notself.content:return str('')#Fallback to auto-detected encoding.

if self.encoding isNone:

encoding=self.apparent_encoding#Decode unicode from given encoding.

try:

content= str(self.content, encoding, errors='replace')except(LookupError, TypeError):#A LookupError is raised if the encoding was not found which could

#indicate a misspelling or similar mistake.

# #A TypeError can be raised if encoding is None

# #So we try blindly encoding.

content = str(self.content, errors='replace')return content

@property

def apparent_encoding(self):

"""The apparent encoding, provided by the lovely Charade library

(Thanks, Ian!)."""

return chardet.detect(self.content)['encoding']

可以看出text方法中对原始数据做了编码操作

其中response的encoding属性是在adapters.py中的HTTPAdapter中的build_response中进行赋值,具体代码如下:

defbuild_response(self, req, resp):"""Builds a :class:`Response ` object from a urllib3

response. This should not be called from user code, and is only exposed

for use when subclassing the

:class:`HTTPAdapter `

:param req: The :class:`PreparedRequest ` used to generate the response.

:param resp: The urllib3 response object."""response=Response()#Fallback to None if there's no status_code, for whatever reason.

response.status_code = getattr(resp, 'status', None)#Make headers case-insensitive.

response.headers = CaseInsensitiveDict(getattr(resp, 'headers', {}))#Set encoding.

response.encoding =get_encoding_from_headers(response.headers)

response.raw=resp

response.reason=response.raw.reasonifisinstance(req.url, bytes):

response.url= req.url.decode('utf-8')else:

response.url=req.url#Add new cookies from the server.

extract_cookies_to_jar(response.cookies, req, resp)#Give the Response some context.

response.request =req

response.connection=selfreturn response

从上述代码(response.encoding =get_encoding_from_headers(response.headers))中可以看出,具体的encoding是通过解析headers得到的,

defget_encoding_from_headers(headers):"""Returns encodings from given HTTP Header Dict.

:param headers: dictionary to extract encoding from."""content_type= headers.get('content-type')if notcontent_type:returnNone

content_type, params=cgi.parse_header(content_type)if 'charset' inparams:return params['charset'].strip("'\"")if 'text' incontent_type:return 'ISO-8859-1'

为避免Requests采用chardet去猜测response的编码,请慎用text属性,直接使用content属性即可,再根据实际需要进行编码。

对于服务端没有显式指明charset的response来说,采用text和content的差别如下所示:

代码:

printtime.time()print 'begin request'r= requests.get(r'http://www.sina.com.cn')#erase response encoding

r.encoding =None

r.text#r.content

print 'request end'

print time.time()

采用text时的耗时:

6fqj5DHn31nrImaRpvp47dYqFmepclOa7t7hacXm86SIOEx8nBfPJONpQxz4sSpyReSFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMO+Ef673JmtCMTGwAAAAASUVORK5CYII=

采用content时的耗时:

hm1MSoJSfedolmgZgl4NRR+GP84KLD3nJMhhvyHR2eRwpAahCE1CENqEIbUIAypQRhSgzCkBmFIDeLFD3SGYZeFVbbYAAAAAElFTkSuQmCC

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值