计算txt文档词频（字典操作）

最新推荐文章于 2022-12-09 19:47:03 发布

Abalagang

最新推荐文章于 2022-12-09 19:47:03 发布

阅读量358

点赞数

分类专栏： my-python

本文链接：https://blog.csdn.net/li860238659/article/details/115743792

版权

python 字典排序算法

my-python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

计算txt文档词频（字典操作）

# -*- coding: utf-8 -*-

def wordcount(filepath):
    # 打开数据导入
    text = open(filepath, "r", encoding="utf-8")
    words = []
    line = text.readline()
    while line:
        words += line.split()  # 将字符串分割为list，默认分隔符为空格
        line = text.readline()
    text.close()
    
    dic = dict()  # 创建新字典
    # 遍历链表构建字典
    for w in words:
        if len(w) > length:
            length = len(w)
        dic[w] = dic.get(w, 0) + 1
    items = list(dic.items())
    
    # 对列表进行双重排序，先按字典排序，再按频率排序，这样就可以由高到低输出词频
    items.sort()  # 进行字典排序主要为了让具有相同词频的单词仍按照字符顺序排序
    items.sort(key=byFreq, reverse=True)  # 提供关键字reverse并设置为True，可以让python以相反的顺序对列表进行排序
    
    n = int(input("Output analysis of how many words?"))
    for i in range(n):
        words, dic = i
        print("{0:<15}{1:>5} ".format(words, dic)  # 对齐操作：单词在15个空格中左对齐印刷，接着5个空格中是右对齐的数字

def byFreq(pair):  # 为了比较词频：使用键函数，用一对数据作为一个参数，并返回对于数据中的第二项
    return pair[1]

if __name__ == '__main__':
    wordcount("路径")

参考资料：《python程序设计（第3版）》——11.7无顺序集合

Abalagang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
计算txt文档词频（字典操作）

计算txt文档词频（字典操作）# -*- coding: utf-8 -*-def wordcount(filepath): # 打开数据导入 text = open(filepath, "r", encoding="utf-8") words = [] line = text.readline() while line: words += line.split() # 将字符串分割为list，默认分隔符为空格 line = t
复制链接

扫一扫

专栏目录