python3.7 nltk中collocations报错解决方法

最新推荐文章于 2021-05-19 01:54:11 发布

锅前带刀小笼包

最新推荐文章于 2021-05-19 01:54:11 发布

阅读量446

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/weixin_42792892/article/details/104583797

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

学习自然语言处理，用到collocations()方法时报错

>>>text1.collocations()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "D:\python\test02\venv\lib\site-packages\nltk\text.py", line 444, in collocations
    w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
  File "D:\python\test02\venv\lib\site-packages\nltk\text.py", line 444, in <listcomp>
    w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
ValueError: too many values to unpack (expected 2)

于是找到D:\python\test02\venv\lib\site-packages\nltk\text.py 中的第444行

        collocation_strings = [
            w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
        ]

发现collocation_strings的值来自collocation_list，于是找到collocation_list

    def collocation_list(self, num=20, window_size=2):
        """
        Return collocations derived from the text, ignoring stopwords.

        :param num: The maximum number of collocations to return.
        :type num: int
        :param window_size: The number of tokens spanned by a collocation (default=2)
        :type window_size: int
        """
        if not (
            "_collocations" in self.__dict__
            and self._num == num
            and self._window_size == window_size
        ):
            self._num = num
            self._window_size = window_size

            # print("Building collocations list")
            from nltk.corpus import stopwords

            ignored_words = stopwords.words("english")
            finder = BigramCollocationFinder.from_words(self.tokens, window_size)
            finder.apply_freq_filter(2)
            finder.apply_word_filter(lambda w: len(w) < 3 or w.lower() in ignored_words)
            bigram_measures = BigramAssocMeasures()
            self._collocations = finder.nbest(bigram_measures.likelihood_ratio, num)
        return [w1 + " " + w2 for w1, w2 in self._collocations]

collocation_list的返回值是一个一维列表，自然不可能for循环出w1和w2，

解决方法

方法一
- 直接使用collocation_list

方法二

修改D:\python\test02\venv\lib\site-packages\nltk\text.py 中的源代码，如下，

    #     collocation_strings = [
    #     w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
    # ]
    collocation_strings = self.collocation_list(num, window_size)

锅前带刀小笼包

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python3.7 nltk中collocations报错解决方法

学习自然语言处理，用到collocations()方法时报错>>>text1.collocations()Traceback (most recent call last): File "<input>", line 1, in <module> File "D:\python\test02\venv\lib\site-packages\nlt...
复制链接

扫一扫

专栏目录