学习自然语言处理,用到collocations()方法时报错
>>>text1.collocations()
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "D:\python\test02\venv\lib\site-packages\nltk\text.py", line 444, in collocations
w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
File "D:\python\test02\venv\lib\site-packages\nltk\text.py", line 444, in <listcomp>
w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
ValueError: too many values to unpack (expected 2)
于是找到D:\python\test02\venv\lib\site-packages\nltk\text.py 中的第444行
collocation_strings = [
w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
]
发现collocation_strings的值来自collocation_list,于是找到collocation_list
def collocation_list(self, num=20, window_size=2):
"""
Return collocations derived from the text, ignoring stopwords.
:param num: The maximum number of collocations to return.
:type num: int
:param window_size: The number of tokens spanned by a collocation (default=2)
:type window_size: int
"""
if not (
"_collocations" in self.__dict__
and self._num == num
and self._window_size == window_size
):
self._num = num
self._window_size = window_size
# print("Building collocations list")
from nltk.corpus import stopwords
ignored_words = stopwords.words("english")
finder = BigramCollocationFinder.from_words(self.tokens, window_size)
finder.apply_freq_filter(2)
finder.apply_word_filter(lambda w: len(w) < 3 or w.lower() in ignored_words)
bigram_measures = BigramAssocMeasures()
self._collocations = finder.nbest(bigram_measures.likelihood_ratio, num)
return [w1 + " " + w2 for w1, w2 in self._collocations]
collocation_list的返回值是一个一维列表,自然不可能for循环出w1和w2,
解决方法
- 方法一
- 直接使用collocation_list
- 方法二
- 修改D:\python\test02\venv\lib\site-packages\nltk\text.py 中的源代码,如下,
# collocation_strings = [ # w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size) # ] collocation_strings = self.collocation_list(num, window_size)