import nltk
from nltk.corpus import brown
suffix_fdist = nltk.FreqDist()
for word in brown.words():
word = word.lower()
suffix_fdist.inc(word[-1:])
suffix_fdist.inc(word[-2:])
suffix_fdist.inc(word[-3:])
common_suffixes = suffix_fdist.keys()[:100]
print(common_suffixes)
在运行第212页的一个例子的时候,报错AttributeError: 'FreqDist' object has no attribute 'inc'。百度了一下,发现是NLTK版本的问题。应该把 "freqdist.inc(sample, count)" 全部改成 "freqdist[sample] += count".然后自己修改了下
import nltk
from nltk.corpus import brown
suffix_fdist = nltk.FreqDist()
for word in brown.words():
word = word.lower()
suffix_fdist[word[-1:]] +=1
suffix_fdist[word[-2:]] +=1
suffix_fdist[word[-3:]] +=1
common_suffixes = list(suffix_fdist.keys())[:100]
print(common_suffixes)
问题解决。输出是频率最高的100个后缀(最后一个字母,两个,以及三个)