最近又在迭代特征工程,发现jieba提取特征词有个需要注意的地方,直接看例子
例子1
>>> import jieba
>>> import jieba.posseg as pseg
>>> s = '我们喜欢支付宝, 苹果'
>>> ws = pseg.cut(s)
>>> for i in ws:
... print i
...
我们/r
喜欢/v
支付宝/nr
,/x
/x
苹果/n
>>> allow_pos = ('nr',)
>>> tags = jieba.analyse.extract_tags(s, topK=10, withWeight=False, allowPOS=allow_pos)
>>> for t in tags:
... print t
...
支付宝
>>> allow_pos = ('nr'