Spark word2vec使用记录

最新推荐文章于 2024-01-07 09:19:49 发布

iwanttolearn_java

最新推荐文章于 2024-01-07 09:19:49 发布

阅读量351

点赞数

分类专栏：随笔文章标签： spark

本文链接：https://blog.csdn.net/iwanttolearn_java/article/details/105768729

版权

随笔专栏收录该内容

1 篇文章

订阅专栏

本文详细介绍了如何在Apache Spark中使用Word2Vec算法进行词向量计算，包括配置Spark环境、创建DataFrame输入数据、训练Word2Vec模型以及获取词向量、关联词列表和文本向量的具体步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

官网介绍

https://spark.apache.org/docs/latest/ml-features.html#word2vec

使用方式

val conf = new SparkConf().setAppName("Word2Vec example").setMaster("local")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)

    // $example on$
    // Input data: Each row is a bag of words from a sentence or document.
    val documentDF = sqlContext.createDataFrame(Seq(
      "Hi I heard about Spark spark".split(" "),
      "I wish Java could use case classes".split(" "),
      "Logistic regression models are neat".split(" ")
    ).map(Tuple1.apply)).toDF("text")

    // Learn a mapping from words to Vectors.
    val word2Vec = new Word2Vec()
      .setInputCol("text")
      .setOutputCol("result")
      .setVectorSize(3)
      .setMinCount(0)
    val model = word2Vec.fit(documentDF)
    model.getVectors.show(false)
    val result = model.transform(documentDF)