前言
在学习了word2vec的牛逼后,开始进入实战,解决问题了。
实战
添加依赖
com.medallia.word2vec
word2vecjava_2.11
1.0-ALLENAI-4
训练模型
由于语料比较小,各项参数,都调小了。
@Service
@Slf4j
public class Word2vecService {
public Word2VecModel train() {
try {
List data = List.of("anarchism originated as a term of abuse first used against early working class radicals including the diggers of the english anarchism originated as a term of abuse first");
List list = Lists.transform(data, var11 -> Arrays.asList(var11.split(" ")));
Word2VecModel word2VecModel = Word2VecModel.trainer().setMinVocabFrequency(1).useNumThreads(4).setWindowSize(1).type(NeuralNetworkType.CBOW).setLayerSize(12).useNegativeSamples(5).setDownSamplingRate(1.0E-4D).setNumIterations(5).setListener((var1, var2)