Spark MLlib垃圾邮件分类示例

最新推荐文章于 2024-02-08 16:20:45 发布

汀桦坞

最新推荐文章于 2024-02-08 16:20:45 发布

阅读量1.7k

点赞数

分类专栏：机器学习大数据文章标签： Spark

本文链接：https://blog.csdn.net/wiborgite/article/details/83313629

版权

大数据同时被 2 个专栏收录

76 篇文章 8 订阅

订阅专栏

机器学习

57 篇文章 7 订阅

订阅专栏

本文是对《Spark快速大数据分析》中Spark机器学习相关内容的一个实践（其中主要代码也是来自该文中的示例代码），只是自己准备了数据，并实际运行体验。

本文数据下载：https://download.csdn.net/download/wiborgite/10739730

本文使用scala实现，在spark-shell中即可执行，代码如下所示：

import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.classification.SVMWithSGD

val spam = sc.textFile("/tmp/spam.txt")
val normal = sc.textFile("/tmp/normal.txt")


// 创建一个HashingTF实例来把邮件文本映射为包含10000个特征的向量
val tf = new HashingTF(numFeatures = 10000)
// 各邮件都被切分为单词，每个单词被映射为一个特征
val spamFeatures = spam.map(email => tf.transform(email.split(" ")))
val normalFeatures = normal.map(email => tf.transform(email.split(" ")))


// 创建LabeledPoint数据集分别存放阳性（垃圾邮件）和阴性（正常邮件）的例子
val positiveExamples = spamFeatures.map(features => LabeledPoint(1, features))
val negativeExamples = normalFeatures.map(features => LabeledPoint(0, features))
val trainingData = positiveExamples.union(negativeExamples)
trainingData.cache() // 因为逻辑回归是迭代算法，所以缓存训练数据RDD


// 使用SGD算法运行逻辑回归
val model = new LogisticRegressionWithSGD().run(trainingData)


//使用SVM算法
val model = new SVMWithSGD().run(trainingData)


// 以阳性（垃圾邮件）和阴性（正常邮件）的例子分别进行测试
val posTest = tf.transform(
"O M G GET cheap stuff by sending money to ...".split(" "))
val negTest = tf.transform(
"Hi Dad, I started studying Spark the other ...".split(" "))
println("Prediction for positive test example: " + model.predict(posTest))
println("Prediction for negative test example: " + model.predict(negTest))

汀桦坞

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
3
评论
Spark MLlib垃圾邮件分类示例

本文是对《Spark快速大数据分析》中Spark机器学习相关内容的一个实践（其中主要代码也是来自该文中的示例代码），只是自己准备了数据，并实际运行体验。本文数据下载：https://download.csdn.net/download/wiborgite/10739730本文使用scala实现，在spark-shell中即可执行，代码如下所示：import org.apache.sp...
复制链接

扫一扫