我有一个.csv文件,其中有一列我收集的消息,我希望得到该列中每个单词的词频列表。这是我到目前为止的情况,我不知道我在哪里犯了错误,任何帮助都将不胜感激。编辑:预期的输出是将单词的整个列表及其计数(不重复)写入另一个.csv文件。在import csv
from collections import Counter
from collections import defaultdict
output_file = 'comments_word_freqency.csv'
input_stream = open('comments.csv')
reader = csv.reader(input_stream, delimiter=',')
reader.next() #skip header
csvrow = [row[3] for row in reader] #Get the fourth column only
with open(output_file, 'rb') as csvfile:
for row in reader:
freq_dict = defaultdict(int) # the "int" part
# means that the VALUES of the dictionary are integers.
for line in csvrow:
words = line.split(" ")
for word in words:
word = word.lower() # ignores case type
freq_dict[word] += 1
writer = csv.writer(open(output_file, "wb+")) # this is what lets you write the csv file.
for key, value in freq_dict.items():
# this iterates through your dictionary and writes each pair as its own line.
writer.writerow([key, value])