python 自然语言处理第二章

最新推荐文章于 2022-11-18 10:02:08 发布

W&J

最新推荐文章于 2022-11-18 10:02:08 发布

阅读量497

点赞数

本文链接：https://blog.csdn.net/hangzuxi8764/article/details/72902966

版权

第二章获得文本语料和词汇资源

import nltk
from nltk.corpus import brown
cfd=nltk.ConditionalFreqDist((genre,word)\
                             for genre in brown.categories()\
                             for word in brown.words(categories=genre))

genre_word=[(genre,word) for genre in ['news','romance'] for word in brown.words(categories=genre)]

cfd=nltk.ConditionalFreqDist(genre_word)

cfd['news']

FreqDist({u'sunbonnet': 1,
          u'Elevated': 1,
          u'narcotic': 2,
          u'four': 73,
          u'woods': 4,
          u'railing': 1,
          u'Until': 5,
          u'aggression': 1,
          u'marching': 2,
          u'increase': 24,
          u'eligible': 4,

          ...})

生成器generator
通过列表生成式，我们可以直接创建一个列表。但是，受到内存限制，列表容量肯定是有限的。而且，创建一个包含100万个元素的列表，不仅占用很大的存储空间，如果我们仅仅需要访问前面几个元素，那后面绝大多数元素占用的空间都白白浪费了。
所以，如果列表元素可以按照某种算法推算出来，那我们是否可以在循环的过程中不断推算出后续的元素呢？这样就不必创建完整的list，从而节省大量的空间。在Python中，这种一边循环一边计算的机制，称为生成器（Generator）。

text=nltk.corpus.genesis.words('english-kjv.txt')
bigrams=nltk.bigrams(text)
cfd=nltk.ConditionalFreqDist(bigrams)

print bigrams
print cfd

<generator object bigrams at 0x000000000751F090>
<ConditionalFreqDist with 2789 conditions>

cfd['living']

FreqDist({u',': 1,
          u'.': 1,
          u'creature': 7,
          u'soul': 1,
          u'substance': 2,
          u'thing': 4})

W&J

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

python 自然语言处理 第二章

python 自然语言处理第二章