我应该计算字典“d”的所有键值在文档“individual articles”中的频率,文档“individual articles”有大约20000个txt文件,文件名为1、2、3、4。。。例如:假设d[Britain]=[5,76289]必须返回属于文档“individual articles”的文件5.txt、76.txt、289.txt中出现Britain的次数,而且我还需要找到它在同一文档中所有文件中的出现频率。在import collections
import sys
import os
import re
sys.stdout=open('dictionary.txt','w')
from collections import Counter
from glob import glob
folderpath='d:/individual-articles'
counter=Counter()
filepaths = glob(os.path.join(folderpath,'*.txt'))
def words_generator(fileobj):
for line in fileobj:
for word in line.split():
yield word
word_count_dict = {}
for file in filepaths:
f = open(file,"r")
words = words_generator(f)
for word in words:
if word not in word_count_dict: