今天写爬虫的时候,用BeautifulSoup对文档做处理
html = open('index-1.txt').read()
soup = BeautifulSoup(html)
print soup.prettify()
发现如果文档中有汉字,当调用prettify方法时,会报错:
UnicodeEncodeError: 'ascii' codec can't encode characters in position xxx-xxx: ordinal not in range(128)
其原因是由于python会调用ascii编码的解码程序去处理字符流,当字符流不属于ascii范围内,会报错.
解决方法:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')