读取txt文件,进行单词统计,制作词频字典
text文件中的数据样例:
Namun CC
aparat NN
yang PRL
berjaga VB
langsung RB
sigap NN
dan CC
…等等
from collections import Counter
#读
File = open("Ind_train.txt",encoding="utf-8")
ori_data = File.readlines()
File.close()
n_list = []
for i in ori_data:
a = i.replace("\t"," ").replace("\n"," ")
n_list.append(a)
#统计返回字典,元素:元素出现次数
counter = Counter(n_list)
print(counter)
#按行写入
for k,v in counter.items():
with open('save.txt', 'a') as f:
f.write(k+"\t")
f.write(str(v))
f.write('\n')
输出counter:
Counter({'yang PRL ': 3413, 'di IN ': 3174, "'' Z ": 3092, 'dan CC ': 2494,
'ini DT ': 1393, 'itu DT ': 1279, 'tidak RB ': 1257})
输出词频字典:
…
Kami PRP 132
mengharapkan VB 9
supaya SC 22
kedua CD 38
…