python 频数分布,频率分布

最新推荐文章于 2022-09-12 16:24:13 发布

纤雀

最新推荐文章于 2022-09-12 16:24:13 发布

阅读量805

点赞数

文章标签： python 频数分布

本篇文章帮大家学习频率分布，包含了频率分布使用方法、操作技巧、实例演示和注意事项，有一定的学习价值，大家可以用来参考。

在文本处理期间经常需要计算文本主体中单词出现的频率。这可以通过应用word_tokenize()函数并将结果附加到列表以保持单词的计数来实现，如下面的程序所示。

from nltk.tokenize import word_tokenize

from nltk.corpus import gutenberg

sample = gutenberg.raw("blake-poems.txt")

token = word_tokenize(sample)

wlist = []

for i in range(50):

wlist.append(token[i])

wordfreq = [wlist.count(w) for w in wlist]

print("Pairs\n" + str(zip(token, wordfreq)))

当运行上面的程序时，我们得到以下输出 -

[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1), (BOOK', 1), (of', 2), (THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1), (Piping', 2), (down', 1), (the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant', 1), (glee', 1), (,', 3), (On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he', 1), (laughing', 1), (said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

条件频率分布

当想要计算满足特定crteria满足一组文本的单词时，使用条件频率分布。

import nltk

#from nltk.tokenize import word_tokenize

from nltk.corpus import brown

cfd = nltk.ConditionalFreqDist(

(genre, word)

for genre in brown.categories()

for word in brown.words(categories=genre))

categories = ['hobbies', 'romance','humor']

searchwords = [ 'may', 'might', 'must', 'will']

cfd.tabulate(conditions=categories, samples=searchwords)

当运行上面的程序时，我们得到以下输出 -

may might must will

hobbies 131 22 83 264

romance 11 51 45 43

humor 8 8 9 13

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 频数分布,频率分布

本篇文章帮大家学习频率分布，包含了频率分布使用方法、操作技巧、实例演示和注意事项，有一定的学习价值，大家可以用来参考。在文本处理期间经常需要计算文本主体中单词出现的频率。这可以通过应用word_tokenize()函数并将结果附加到列表以保持单词的计数来实现，如下面的程序所示。from nltk.tokenize import word_tokenizefrom nltk.corpus impo...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。