python2.x下jieba编码问题

damaohao88

于 2020-02-24 22:29:45 发布

阅读量284

点赞数

分类专栏：疑难杂症 Python 文章标签： python2 jieba

本文链接：https://blog.csdn.net/maoersong/article/details/87271640

版权

疑难杂症同时被 2 个专栏收录

25 篇文章 0 订阅

订阅专栏

Python

14 篇文章 1 订阅

订阅专栏

python3 jieba分词不会遇到UnicodeEncodeError问题，因为在cut函数加入了strdecode函数，处理编码的问题，而python2并没有做处理。

 def cut(self, sentence, cut_all=False, HMM=True):
        '''
        The main function that segments an entire sentence that contains
        Chinese characters into seperated words.

        Parameter:
            - sentence: The str(unicode) to be segmented.
            - cut_all: Model type. True for full pattern, False for accurate pattern.
            - HMM: Whether to use the Hidden Markov Model.
        '''
        sentence = strdecode(sentence)