用python的urllib2抓取web页面时发现中文显示的乱码:
import urllib2
#获取手机号的归属地
url = "http://www.ip138.com:8080/search.asp?action=mobile&mobile=1380013"
request = urllib2.Request(url)
request.add_header('User-Agent', 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
response = urllib2.urlopen(request).read()
print response
返回结果有乱码?解码:
print response.decode("UTF-8")
结果报错:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa1 in position 251: invalid start byte
最终发现,该接口返回页面用的字符集是gb2312,如图:
正确的方法:
print response.decode("gb2312")