TextGrocery中文文本分类处理

最新推荐文章于 2022-04-19 13:14:02 发布

weixin_33752045

最新推荐文章于 2022-04-19 13:14:02 发布

阅读量274

点赞数

文章标签：人工智能 python

详细使用说明：http://textgrocery.readthedocs.io/zh/latest/index.html

TextGrocery是一个基于LibLinear和结巴分词的短文本分类工具，特点是高效易用，同时支持中文和英文语料。

GitHub项目链接

需要安装：

pip install classifier

过程：

>>> from tgrocery import Grocery
# 新开张一个杂货铺（别忘了取名）
>>> grocery = Grocery('sample')
# 训练文本可以用列表传入
>>> train_src = [
        ('education', '名师指导托福语法技巧：名词的复数形式'),
...     ('education', '中国高考成绩海外认可 是“狼来了”吗？'),
...     ('sports', '图文：法网孟菲尔斯苦战进16强 孟菲尔斯怒吼'),
...     ('sports', '四川丹棱举行全国长距登山挑战赛 近万人参与')
... ]
>>> grocery.train(train_src)
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 1.125 seconds.
Prefix dict has been built succesfully.
*
optimization finished, #iter = 3
Objective value = -1.092381
nSV = 8
<tgrocery.Grocery object at 0x7f23cf243b50>
>>> grocery.save()
>>> new_grocery = Grocery('sample')
>>> new_grocery.load()
>>> new_grocery.predict('考生必读：新托福写作考试评分标准')
<tgrocery.base.GroceryPredictResult object at 0x4490d50>
>>> new_grocery.predict('考生必读：新托福写作考试评分标准')
<tgrocery.base.GroceryPredictResult object at 0x4490d90>
>>> result = new_grocery.predict('考生必读：新托福写作考试评分标准')
>>> print result
education