文本词频统计

最新推荐文章于 2022-03-10 16:50:42 发布

akak008

最新推荐文章于 2022-03-10 16:50:42 发布

阅读量198

点赞数 1

本文链接：https://blog.csdn.net/akak008/article/details/118671580

版权

北理工嵩天python基础文

def getText():                                      # 定义一个函数:getText
    txt = open("hamlet.txt","r").read()
    txt = txt.lower()                               # Str.lower() 全变成小写
    for ch in "!\"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~'": # 遍历文本中的特殊字符
        txt = txt.replace(ch," ")                   # 使用Str.replace("遍历的ch","空格")方法 将特殊字符替换为空格
    return txt

hamletTxt = getText()
words = hamletTxt.split( )                          # Str.split方法-默认-以空格分隔字符串,并且返回一个列表--偷偷把字符串转换成列表
counts = {}                                         # 定义一个空字典
for word in words:                                  # 遍历列表中的单词 生成的列表大概 
                                                       ["with","mirth","in".....]长这样子
    counts[word] = counts.get(word,0) + 1

    """这里想了好久,分解成3个方法,最后目的是生成一个{"The":133}
    也就是{key是单词:value是出现次数}的字典    
    ①往字典里面添加元素:dict["key"] = 值 比如dict["a"] = 1 {"a":1}
    ②dict.get(键(键存在就返回键对应的值),不存在要返回的值)
    比如a={"mage":"王昭君"} a.get("mage",0) print"王昭君"
    ③遍历单词,如果不存在,生成字典元素{遍历的单词:返回的0+1}
    如果存在,改变值,也就是遍历一次,值＋1
    """
items = list(counts.items())
# dict.items()方法,以列表返回可遍历的 键值对 转换后 [('the',133),('she',222)...]
items.sort(key=lambda x:x[1],reverse=True) 
# sort方法排序,dict.sort(reverse=True)降序,前面那块表示以[('the',133),('she',222)...]元组里第二个数据排序
for i in range(10):
    word,count = items[i]                # 遍历前十名 因为有一个元素里有两个元素,
    print("{0:<10}{1:>5}".format(word,count))

本词频统计解惑