UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xaf in position 34: illegal multibyte sequence
原始读取代码:
with open(file, 'r') as f:
for line in f:
line = line.strip().split("\t")
en.append(["BOS"] + nltk.word_tokenize(line[0