尝试:>>> from __future__ import division
>>> from nltk.corpus import inaugural
>>> total_lens = 0
>>> for i, sent in enumerate(inaugural.sents()):
... total_lens += len(sent)
...
>>> total_lens
145735
>>> i
4867
>>> avg_sent_len = total_lens / i
>>> avg_sent_len
29.943497020752
>>> avg_sent_len = total_lens / (i+1)
>>> avg_sent_len
29.9373459326212
注意,当分母足够大时+1并不重要。在
Mirco在所有文本中平均句子长度
以下代码是一行代码,但不鼓励使用,因为您可能已经实现了生成器两次:
^{2}$
所有文本的Marco平均句子长度:>>> sum(sum(len(sent) for sent in inaugural.sents(fileids=[fileid])) / len(inaugural.sents(fileids=[fileid])) for fileid in inaugural.fileids()) / len(inaugural.file