sparkMLib

31 篇文章 0 订阅

1.读源文件
2、根据源文件rdd做出特征向量Vector
3、根据特征向量Vector做出标签点LabeledPoint
4、根据标签点LabeledPoint做出训练数据trainingData
5、做出LogisticRegressionWithSGD算法对象
6、把训练数据trainingData传给LogisticRegressionWithSGD的run方法做出model(公式)
7、根据公式做出新数据的预测
代码:
依赖:

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>
package com.km.sparkdemo.mlib


import org.apache.spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD}
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.linalg
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

/**
  * @Author Lucas
  * @Date 2020/7/6 0:45
  * @Version 1.0
  */
object TestMLib {
  val spamFile = "c:/sparkdata/spam.txt"
  val normalFile = "c:/sparkdata/ham.txt"
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("testMlit")
    val sc = new SparkContext(conf)
    val spam: RDD[String] = sc.textFile(spamFile)
    val normal: RDD[String] = sc.textFile(normalFile)

    //
    val tf = new HashingTF(10000)
    //这是spam(垃圾邮件)的特征向量
    val spamVec: RDD[linalg.Vector] = spam.map(line => tf.transform(line.split(" ")))
    //这是正常邮件的特征向量
    val normalVec: RDD[linalg.Vector] = normal.map(line => tf.transform(line.split(" ")))

    //这是存放垃圾邮件的LabeledPoint
    val spamLab: RDD[LabeledPoint] = spamVec.map(features => LabeledPoint(1,features))
    //
    val normalLab: RDD[LabeledPoint] = normalVec.map(features => LabeledPoint(0,features))

    //得到训练数据
    val trainingData: RDD[LabeledPoint] = spamLab.union(normalLab)
    trainingData.persist()



    //得到一个模型(逻辑回归)
    val logisticRWS: LogisticRegressionWithSGD = new LogisticRegressionWithSGD()
    val model: LogisticRegressionModel = logisticRWS.run(trainingData)



    val infoVec1: linalg.Vector = tf.transform("O M G GET cheap stuff by sending money to ...".split(" "))
    val info1: Double = model.predict(infoVec1)
    println(info1)

    val info2Vec: linalg.Vector = tf.transform("Hi Dad, I started studying Spark the other ...".split(" "))
    val info2: Double = model.predict(info2Vec)
    println(info2)
//    // 垃圾邮件测试
//    println(model.predict(tf.transform("O M G GET cheap stuff by sending money to ...".split(" "))))
//    // 正常邮件测试
//    println(model.predict(tf.transform("Hi Dad, I started studying Spark the other ...".split(" "))))


    sc.stop()

  }
}

文件:
spam.txt

Dear sir, I am a Prince in a far kingdom you have not heard of.  I want to send you money via wire transfer so please ...
Get Viagra real cheap!  Send money right away to ...
Oh my gosh you can be really strong too with these drugs found in the rainforest. Get them cheap right now ...
YOUR COMPUTER HAS BEEN INFECTED!  YOU MUST RESET YOUR PASSWORD.  Reply to this email with your password and SSN ...
THIS IS NOT A SCAM!  Send money and get access to awesome stuff really cheap and never have to ...

ham.txt

Dear Spark Learner, Thanks so much for attending the Spark Summit 2014!  Check out videos of talks from the summit at ...
Hi Mom, Apologies for being late about emailing and forgetting to send you the package.  I hope you and bro have been ...
Wow, hey Fred, just heard about the Spark petabyte sort.  I think we need to take time to try it out immediately ...
Hi Spark user list, This is my first question to this list, so thanks in advance for your help!  I tried running ...
Thanks Tom for your email.  I need to refer you to Alice for this one.  I haven't yet figured out that part either ...
Good job yesterday!  I was attending your talk, and really enjoyed it.  I want to try out GraphX ...
Summit demo got whoops from audience!  Had to let you know. --Joe

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值