jieba 库的常用函数以及实战应用，文本统计字符次数

最新推荐文章于 2024-02-27 20:09:31 发布

A_tiny_fish_

最新推荐文章于 2024-02-27 20:09:31 发布

阅读量2.7k

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/qq_45752450/article/details/106577483

版权

本文介绍了jieba库在Python中的使用，通过示例展示了如何统计《哈默雷特》和《三国演义》中出现次数最多的词和人名，揭示了jieba库在文本处理中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

jieba库常用函数

文本字符统计实例

一统计哈默雷特中出现次数最多的前10个词

#将文本归一化
def getText():
    txt=open('hamlet.txt','r').read()
    txt=txt.lower()
    for cf in '!@、\";:,.()[]{}<>=-_*&^%$#`~/?|\':
        txt=txt.replace(ch,'')
    return txt
hanmletTxt=getText()
words=hamletTxt.spliit()
counts={}
for word in words:
    counts[word] = counts.get(word,0)+1
items=list(count.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(10):
    word,count =items[i]
    print('{0:<10}{1:>5}'.format(word,count))