weka的StringToWordVector类可以将给定的文档格式的内容转换为vms模型的内容,而后者是文本分类必须的模块。按照weka要求,生成arff格式的文本:
@relation D__java_weka_data
@attribute text string
@attribute class {test1,test2,test3}
@data
'here we go go go go to do ',test1
'Mostly, I expect we are interested in indexing XPath queries',test1
'so what do you think you can do anything?',test2
'Sparse ARFF files are very similar to ARFF files',test3
按照StringToWordVector类的命令格式,设定opti