读取停用词
适用于中文文本分词,读取停用词,去除冗余信息
两种不同的代码(区别在于数据类型)
停用词TXT
代码一
下面展示可用 代码块
。
stop_words = []
with open('stop_words.txt',"r",encoding="UTF-8") as f:
line = f.readline()
while line:
stop_words.append(line[:-1])
line = f.readline()
stop_words = set(stop_words)
print('停用词读取完毕,共{n}个词'.format(n=len(stop_words)))
代码二
stop_words = []
with open("stop_words.txt", encoding='gb18030', errors='ignore') as f:
line = f.readline()
while line:
stop_words.append(line[:-1])
line = f.readline()
stop_words = set(stop_words)
print('停用词读取完毕,共{n}个词'.format(n=len(stop_words)))
词典