scala word2vec 工程异常总结

scala word2vec在集群出现奇奇怪怪的问题,代码如下:

val documentDF = sentence.map(Tuple1.apply)
      .toDF("user_item")
      .repartition(15)

    documentDF.show(3, false)

    val model = new Word2Vec()
      .setInputCol("user_item")
      .setOutputCol("vector")
      .setVectorSize(64)
      .setWindowSize(2)
      .setMinCount(1)
      .setMaxIter(20)
      .setStepSize(0.025)
      .setNumPartitions(62)
      .fit(documentDF)

//    val modelPath = "/Model"
//    model.write.overwrite().save(modelPath)
//    model.findSynonyms("fdc2d9ef27bc4d149e3b4b65915c7cf5", 20)
//      .show(20,false)
  println("save w2v ...")
  val word2Vec = model.getVectors.select("word", "vector")
    .as[w2v]
    .rdd
    .repartition(64)
    .map(x=>(x.word, x.vector.drop(1).dropRight(1)))
    .toDF("word", "vector")
  val w2vPath = "/wordVector"
  saveMethod(word2Vec.toDF, w2vPath)
  word2Vec.unpersist()

异常1:输出的词向量出现 infinity

scala> model.getVectors.show()
+-------------+--------------------+
|         word|              vector|
+-------------+--------------------+
|     Unspoken|[-Infinity,-Infin...|
|       Talent|[Infinity,-Infini...|
|    Hourglass|[1.09657520526310...|
|Nickelodeon's|[2.20436549446219...|
|      Priests|[-1.9625896848389...|
|    Religion:|[-3.8815759928213...|
|           Bu|[-7.9722236466752...|
|      Totoro:|[-4.1829056206528...|
|     Trouble,|[2.51985378203136...|
|       Hatter|[8.49108115961009...|
|          '79|[-5.4560309784650...|
|         Vile|[-1.2059769646379...|
|         9/11|[Infinity,-Infini...|
|      Santino|[6.30405421282099...|
|      Motives|[1.96207712570869...|
|          '13|[-1.7641987324084...|
|       Fierce|[-Infinity,Infini...|
|       Stover|[5.10057474120744...|
|          'It|[1.08629989605664...|
|        Butts|[Infinity,Infinit...|
+-------------+--------------------+
only showing top 20 rows

查询知:综合:Word2Vec generate infinity vectors when numIterations are large,以及Fix infinity vectors produced by Word2Vec when numIterations are large, 发现:setNumPartitions 设置太大,后修改为15,问题解决;

异常2:代码运行中出现,无故终止情形,报word2vector内部错误,或者报 .fit() 错误,要么训练过程直接终止,要么保存词向量过程失败,偶尔能成功运行(这是最致命的)。翻遍网络没发现问题,最后发现是内存超限,调整调度参数:内存大小,成功

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值