glove词向量为utf-8格式编码文件,python3中以gbk编码格式读入会出错:`
glove = open('glove.6B.100d.txt', 'r')
word = list()
word_vector = list()
line = glove.readline() #一行一行的读取,返回str
while line:
line = list(line.split())
word.append(line[0])
word_vector.append(line[1:])
line = glove.readline()
结果:
File "F:/data set/NLP/experiment1.py", line 9, in <module>
line = glove.readline()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5456: illegal multibyte sequence
line
['political', '-0.33926', '0.068714'<