Spark-MLlib的快速使用之十二(逻辑回归 垃圾邮件分类)

本文介绍了如何利用Spark-MLlib的逻辑回归算法进行垃圾邮件分类。通过读取ham和spam两个文本文件,将邮件内容转化为特征向量,构建正负样本集,并使用LogisticRegressionWithSGD进行训练。最后,对测试数据进行预测,展示模型效果。
摘要由CSDN通过智能技术生成

 

Logistic分类器的适用范围:,比如“是否为垃圾邮件”、“是否为体育新闻”、‘病患是否得了流感’。

(1)训练数据

ham.txt

Dear Spark Learner, Thanks so much for attending the Spark Summit 2014! Check out videos of talks from the summit at ...

Hi Mom, Apologies for being late about emailing and forgetting to send you the package. I hope you and bro have been ...

Wow, hey Fred, just heard about the Spark petabyte sort. I think we need to take time to try it out immediately ...

Hi Spark user list, This is my first question to this list, so thanks in advance for your help! I tried running ...

Thanks Tom for your email. I need to refer you to Alice for this one. I haven't yet figured out that part either ...

Good job yesterday! I was attending your talk, and really enjoyed it. I want to try out GraphX ...

Summit demo got whoops from audience! Had to let you know. --Joe

spam.txt

Dear sir, I am a Prince in a far kingdom you have

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值