jieba和thulac使用比较

最新推荐文章于 2024-08-15 09:26:48 发布

微电子学与固体电子学-俞驰

最新推荐文章于 2024-08-15 09:26:48 发布

阅读量5k

点赞数

分类专栏： Python自然语言处理

Python自然语言处理专栏收录该内容

60 篇文章 0 订阅

订阅专栏

jieba和thulac使用比较

#coding=utf-8
import thulac
import time
import jieba
#test='我们还提供更复杂、完善和精确的分词和词性标注联合模型Model_3和分词词表。该模型是由多语料联合训练训练得到（语料包括来自多文体的标注文本和人民日报标注文本等）'

a=jieba.cut('我想听邓紫棋的忘情水')
end2=time.time()
print (' '.join(a))

thu1=thulac.thulac(seg_only=True)
text = thu1.cut('我要听邓紫棋的忘情水', text=True)
end1=time.time()
print (text)

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.645 seconds.
Prefix dict has been built succesfully.
我想听邓紫棋的忘情水
Model loaded succeed
我要听邓紫棋的忘情水

jieba分词效果相对好些。