1 models介绍
该版本的词性标注工具中有一个models文件夹,该文件夹下有两种类型的文件:.tagger类型和. props类型。其中.tagger类型的文件是词性标注训练出来的模型文件,. props类型是其对应的properties文件。models文件夹下所有的文件如下图:
下面分别介绍该文件夹下主要的几个模型。
1.1 arabic.tagger
Trainedon the *entire* ATB p1-3.
Whentrained on the train part of the ATB p1-3 split done for the 2005
JHUSummer Workshop (Diab split), using (augmented) Bies tags, it gets
thefollowing performance:
96.26% ontest portion according to Diab split
(80.14%on unknown words)
1.2 chinese-distsim.tagger
Trained on a combination of CTB7 texts fromChinese and Hong Kong
sources with distributional similarityclusters.
LDC Chinese Treebank POS t