part one data_process.py
part one data_process.py
1、
for line in stop_words_file.readlines():
stopwords_list.append(line.decode('gdk')[:-1])
改为
for line in stop_words_file.readlines():
stopwords_list.append(line[:-1])
2、
with open(train_path, 'r') as f:
改为
with open(train_path,'r',encoding='utf-8') as f:
3、
if A[label].has_key(word):
改为
if word in A[label]:
4、
检查是否有
^
改为
**
5、
a = sorted(CHI.iteritems(), key=lambda t: t[1], reverse=True)[:100]
改为
a = sorted(CHI.items(), key=lambda t: t[1], reverse=True)[:100]