java垃圾邮件分类器_Spark垃圾邮件分类(scala+java)

最新推荐文章于 2022-04-22 08:21:25 发布

weixin_39777543

最新推荐文章于 2022-04-22 08:21:25 发布

阅读量92

点赞数

文章标签： java垃圾邮件分类器

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39777543/article/details/114744166

版权

import java.util.Arrays;

import org.apache.spark.SparkConf;

import org.apache.spark.api.java.JavaRDD;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.api.java.function.Function;

import org.apache.spark.mllib.classification.LogisticRegressionModel;

import org.apache.spark.mllib.classification.LogisticRegressionWithSGD;

import org.apache.spark.mllib.feature.HashingTF;

import org.apache.spark.mllib.linalg.Vector;

import org.apache.spark.mllib.regression.LabeledPoint;

/**

* Created by hui on 2017/11/29.

*/

public class MLlib {

public static void main(String[] args) {

SparkConf sparkConf = new SparkConf().setAppName("JavaBookExample").setMaster("local");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

// Load 2 types of emails from text files: spam and ham (non-spam).

// Each line has text from one email.

JavaRDD spam = sc.textFile("files/spam.txt");

JavaRDD ham = sc.textFile("files/ham.txt");

// Create a HashingTF instance to map email text to vectors of 100 features.

final HashingTF tf = new HashingTF(100);

// Each email is split into words, and each word is mapped to one feature.

// Create LabeledPoint datasets for positive (spam) and negative (ham) examples.

JavaRDD positiveExamples = spam.map(new Function() {

@Override public LabeledPoint call(String email) {

return new LabeledPoint(1, tf.transform(Arrays.asList(email.split(" "))));

}

});

JavaRDD negativeExamples = ham.map(new Function() {

@Override public LabeledPoint call(String email) {

return new LabeledPoint(0, tf.transform(Arrays.asList(email.split(" "))));

}

});

JavaRDD trainingData = positiveExamples.union(negativeExamples);

trainingData.cache(); // Cache data since Logistic Regression is an iterative algorithm.

// Create a Logistic Regression learner which uses the LBFGS optimizer.

LogisticRegressionWithSGD lrLearner = new LogisticRegressionWithSGD();

// Run the actual learning algorithm on the training data.

LogisticRegressionModel model = lrLearner.run(trainingData.rdd());

// Test on a positive example (spam) and a negative one (ham).

// First apply the same HashingTF feature transformation used on the training data.

Vector posTestExample =

tf.transform(Arrays.asList("O M G GET cheap stuff by sending money to ...".split(" ")));

Vector negTestExample =

tf.transform(Arrays.asList("Hi Dad, I started studying Spark the other ...".split(" ")));

// Now use the learned model to predict spam/ham for new emails.

System.out.println("Prediction for positive test example: " + model.predict(posTestExample));

System.out.println("Prediction for negative test example: " + model.predict(negTestExample));

sc.stop();

}

}

weixin_39777543

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java垃圾邮件分类器_Spark垃圾邮件分类(scala+java)

import java.util.Arrays;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.Function;...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。