官网介绍
https://spark.apache.org/docs/latest/ml-features.html#word2vec
使用方式
val conf = new SparkConf().setAppName("Word2Vec example").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// $example on$
// Input data: Each row is a bag of words from a sentence or document.
val documentDF = sqlContext.createDataFrame(Seq(
"Hi I heard about Spark spark".split(" "),
"I wish Java could use case classes".split(" "),
"Logistic regression models are neat".split(" ")
).map(Tuple1.apply)).toDF("text")
// Learn a mapping from words to Vectors.
val word2Vec = new Word2Vec()
.setInputCol("text")
.setOutputCol("result")
.setVectorSize(3)
.setMinCount(0)
val model = word2Vec.fit(documentDF)
model.getVectors.show(false)
val result = model.transform(documentDF)
spark word2vec不仅计算了词的向量值,还可以计算词的关联词列表,以及一行文本的词向量
如果仅仅想通过word2vec计算词的向量,直接通过model.getVectors
就可以获取到了
想获取词的关联词列表
model.findSynonyms("Spark",10)
可以获取
获取文本的向量值
val result = model.transform(documentDF) result.select("result").take(3).foreach(println)