Spark GraphX 学习笔记——垃圾信息检测:LogisticRegressionWithSGD

15 篇文章 0 订阅
6 篇文章 0 订阅
垃圾信息检测:LogisticRegressionWithSGD (Stochastic Gradient Descent)
	
	1)构建训练集的图

import org.apache.spark.graphx._
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
val trainV = sc.makeRDD(Array((1L, (0,1,false)), (2L, (0,0,false)), (3L, (1,0,false)), (4L, (0,0,false)), (5L, (0,0,false)), (6L, (0,0,false)), (7L, (0,0,false)), (8L, (0,0,false)), (9L, (0,1,false)), (10L,(0,0,false)), (11L,(5,2,true)), (12L,(0,0,true)), (13L,(1,0,false))))
val trainE = sc.makeRDD(Array(Edge(1L,9L,""), Edge(2L,3L,""), Edge(3L,10L,""), Edge(4L,9L,""), Edge(4L,10L,""), Edge(5L,6L,""), Edge(5L,11L,""), Edge(5L,12L,""), Edge(6L,11L,""), Edge(6L,12L,""), Edge(7L,8L,""), Edge(7L,11L,""), Edge(7L,12L,""), Edge(7L,13L,""), Edge(8L,11L,""), Edge(8L,12L,""), Edge(8L,13L,""), Edge(9L,2L,""), Edge(9L,13L,""), Edge(10L,13L,""), Edge(12L,9L,"")))
val trainG = Graph(trainV, trainE)

	2)逻辑回归数据准备

import org.apache.spark.graphx.lib.PageRank
import org.apache.spark.mllib.linalg.DenseVector
import org.apache.spark.mllib.regression.LabeledPoint
def augment(g:Graph[Tuple3[Int,Int,Boolean],String]) =
	g.vertices.join(
		PageRank.run(trainG, 1).vertices.join(
			PageRank.run(trainG, 5).vertices
		).map(x => (x._1,x._2._2/x._2._1))
	).map(x => LabeledPoint(
		if (x._2._1._3) 1 else 0,
		new DenseVector(Array(x._2._1._1, x._2._1._2, x._2._2))))

	3)训练逻辑回归模型

val trainSet = augment(trainG)
val model = LogisticRegressionWithSGD.train(trainSet, 10)

	4)评估逻辑回归模型的效果

import org.apache.spark.rdd.RDD
def perf(s:RDD[LabeledPoint]) = 100 * (s.count - s.map(x => math.abs(model.predict(x.features)-x.label)).reduce(_ + _)) / s.count

perf(trainSet)

	5)在测试数据集上构建图进行效果评估
val testV = sc.makeRDD(Array((1L, (0,1,false)), (2L, (0,0,false)), (3L, (1,0,false)), (4L, (5,4,true)), (5L, (0,1,false)), (6L, (0,0,false)), (7L, (1,1,true))))
val testE = sc.makeRDD(Array(Edge(1L,5L,""), Edge(2L,5L,""), Edge(3L,6L,""), Edge(4L,6L,""), Edge(5L,7L,""), Edge(6L,7L,"")))

perf(augment(Graph(testV,testE)))

代码语言:scala
参考书籍:Spark GraphX in Action

 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值