要求:统计文件中出现词频最多的前10个长度不小于2个字符的词语,将词语及其出现的词频数按照词频数递减排序后显示在屏幕上,每行显示一个词语,用英语冒号连接词语及词频。示例如:我们:5
直接扔给Chat GPT,根据反馈结果微调,得到:
import jieba
from collections import Counter
def count_top_words(file_path):
# Read the file with the appropriate encoding
with open(file_path, 'r', encoding='gbk') as file:
text = file.read()
# Perform word segmentation using jieba
words = jieba.lcut(text)
# Filter out words with a length less than 2 characters
words = [word for word in words if len(word) >= 2]
# Count the frequencies of the words
word_freq = Counter(words)
# Get the top 10 most frequent words
top_words = word_freq.most_common(10)
# Display the words and frequencies
for word, freq in top_words:
print(f"{word}:{freq}")
# Provide the path to your file
file_path = 'path/to/your/file.txt'
# Call the function to count top words
count_top_words(file_path)
用时5min,未来已来。