python urllib2遇到Content-Encoding=gzip解码为乱码的解决方案

最新推荐文章于 2024-08-15 10:50:08 发布

biboshouyu

最新推荐文章于 2024-08-15 10:50:08 发布

阅读量4.7k

点赞数

分类专栏： python 文章标签： python 编码乱码 gzip

本文链接：https://blog.csdn.net/biboshouyu/article/details/72501542

版权

python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

python urllib2遇到Content-Encoding=gzip解码为乱码的解决方案

用Chrome的开发者工具，查看网页的headers，如果response headers出现Content-Encoding : gzip，则urllib2无法对其内容进行解码。

需要用gzip模块来处理，详细方法如下：

yresponse = urllib2.urlopen(url)
rspheaders = yresponse.info()
yread = yresponse.read()
if ('Content-Encoding' in rspheaders and rspheaders['Content-Encoding'] == 'gzip') or ('content-encoding' in rspheaders and rspheaders['content-encoding'] == 'gzip'):
	import gzip
	import StringIO
	ydata = StringIO.StringIO(yread)
	ygz = gzip.GzipFile(fileobj = ydata)
	yread = ygz.read()
	ygz.close()
	ystr = yread.decode('utf8', 'ignore').encode('GB2312')
else:
	ystr = yread.decode('utf8', 'ignore').encode('GB2312')

代码中url是需要访问的网址，decode()和encode()根据实际情况确定。