最近开始学习python,在做网络库学习时,遇到个问题,就是抓取百度主页,然后把内容保存在文本文件里,在往文件写数据时抛出异常“UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 1226: illegal multibyte sequence”,大家都很清楚这是个字符编码的异常,但是这里是哪个环节出的编码问题呢,先百度一下吧,找到的结果都是说网络请求时编码有问题,按照他们的解决方式常识解决--无果,最后开始逐条分析异常信息,发现抛出异常在写文件时,这肯定是文件编码有问题喽,设置了UTF-8后问题解决,以下是我的代码:
from urllib.request import urlopen import os class Network: __url = "" def __init__(self, url): self.url = url def getData(self): baiduHtml = open("baidu.html", "w+", 50, "UTF-8") # baiduHtml = open("baidu.html", "w+") for line in urlopen(self.url): line = line.decode("utf-8") print(line) baiduHtml.write(line) network = Network("http://www.baidu.com") network.getData()network.getData()