UnicodeEncodeError: ‘gbk’ codec can’t encode character

最新推荐文章于 2023-10-20 09:12:17 发布

两步一脚印

最新推荐文章于 2023-10-20 09:12:17 发布

阅读量1.1k

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/haoxizh/article/details/44598001

版权

python 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

python抓取重庆大学图书馆主页“http://lib.cqu.edu.cn/newversion/index.htm”，网页编码为"UTF-8"

工具：python 3.4.2，windows平台

源码如下：

from urllib import request, parse    
  
url = 'http://lib.cqu.edu.cn/newversion/index.htm'  
  
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'    
values = {'name' : 'ZH',    
          'location' : 'CQU',    
          'language' : 'Python' }    
  
headers = { 'User-Agent' : user_agent }    
data = parse.urlencode(values).encode('UTF-8')    
req = request.Request(url, data, headers)    
response = request.urlopen(req)    
page = response.read().decode('UTF-8') 
print(page.decode('UTF-8'))

却出现如题错误：

UnicodeEncodeError: ‘gbk’ codec can’t encode character ......

显然这是编码类的错误，多次试了’GBK‘,'UTF-8'的编码、解码，可是还是没能解决问题。

然后是网上不断的搜索......

终于找到问题所在：

需要print出来的话，由于本地系统是Win7中的cmd，默认codepage是CP936，即GBK的编码，所以需要先将上述的Unicode的titleUni先编码为GBK，然后再在cmd中显示出来，然后由于titleUni中包含一些GBK中无法显示的字符，导致此时提示“’gbk’ codec can’t encode”的错误的。

知道问题症结所在了，就好解决了：

from urllib import request, parse    
  
url = 'http://lib.cqu.edu.cn/newversion/index.htm'  
  
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'    
values = {'name' : 'ZH',    
          'location' : 'CQU',    
          'language' : 'Python' }    
  
headers = { 'User-Agent' : user_agent }    
data = parse.urlencode(values).encode('UTF-8')    
req = request.Request(url, data, headers)    
response = request.urlopen(req)    
page = response.read().decode('UTF-8') 
localprint = page.encode('gbk','ignore')
print(localprint)