推荐算法案例

最新推荐文章于 2022-04-22 11:08:00 发布

weixin_44617428

最新推荐文章于 2022-04-22 11:08:00 发布

阅读量607

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/weixin_44617428/article/details/102774906

版权

spark 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

数据

userId,itemId,score
1 11 2
1 12 3
1 13 1
1 14 0
1 15 1
2 11 1
2 12 2
2 13 2
2 14 1
2 15 4
3 11 2
3 12 3
3 14 0
3 15 1
4 11 1
4 12 2
4 14 1
4 15 4
5 11 1
5 12 2
5 13 2
5 14 1
5 15 4

代码

package cn.tedu.als

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.mllib.recommendation.Rating
import org.apache.spark.mllib.recommendation.ALS

object Driver {

def main(args: Array[String]): Unit = {

val conf=new SparkConf().setMaster("local").setAppName("als")

val sc=new SparkContext(conf)

val data=sc.textFile("c://data/ml/als.txt")

//--为了满足推荐系统建模的需要，RDD[String:line]->RDD[Rating(userId,itemId,score)]
val ratings=data.map { line =>
  val info=line.split(" ")
  val userId=info(0).toInt
  val itemId=info(1).toInt
  val score=info(2).toDouble
  
  Rating(userId,itemId,score)

}

//--建立推荐系统模型，底层通过ALS算法来实现
//--①参:数据集  ②参:隐藏因子数 K 低阶，要小于u和i
//--③参:最大迭代次数  ④参:λ 正则化参数，防止模型过拟合
val model=ALS.train(ratings, 3, 10, 0.01)

//--下面表示为用户3推荐1个商品
val u3Result=model.recommendProducts(3, 1)

//--下面表示为12号商品推荐2名用户
val item12Result=model.recommendUsers(12, 2)

//--预测3号用户对12号商品的打分
val u3Item12Result=model.predict(3, 12)

}
}

数据

logistic.txt
17 1 1 1
44 0 0 1
48 1 0 1
55 0 0 1
75 1 1 1
35 0 1 0
42 1 1 0
57 0 0 0
28 0 1 0
20 0 1 0
38 1 0 0
45 0 1 0
47 1 1 0
52 0 0 0
55 0 1 0
68 1 0 1
18 1 0 1
68 0 0 1
48 1 1 1
17 0 0 1

testlogistic.txt
17 0 0
44 1 1
48 0 1
55 1 0

代码

package cn.tedu.logistic

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS

object Driver {

def main(args: Array[String]): Unit = {
val conf=new SparkConf().setMaster(“local”).setAppName(“logistic”)

val sc=new SparkContext(conf)

val data=sc.textFile("c://data/ml/logistic.txt")

//--为了满足建模需要，RDD[String:line]->RDD[LabeledPoint(Y,Vector(X1,X2...))]
val r1=data.map { line =>
  val info=line.split("\t")
  val Y=info.last.toDouble
  //--获取自变量数组
  val XArr=info.take(3).map { num => num.toDouble }
  
  LabeledPoint(Y,Vectors.dense(XArr))
}

//--建立逻辑回归模型，底层通过随机梯度下降法来求解系数
//val model=LogisticRegressionWithSGD.train(r1,10,0.7)


//--建立逻辑回归模型，底层使用拟牛顿法来求解系数
//--拟牛顿法属于通过数值解（迭代式）逼近真实解，不需要定义步长。属于快速迭代法。
//--但是这种算法在数据量较大时，计算代价会较大。
val model=new LogisticRegressionWithLBFGS().run(r1)


//--获取自变量系数
val coef=model.weights

//--回代数据集，预测。最终得到的结果 0 or 1
val prediction=model.predict(r1.map { labeledpoint => labeledpoint.features })

val testData=sc.textFile("c://data/ml/testlogistic.txt")

//--RDD[String]->RDD[Vector(X1,X2,X3)]
val testRDD=testData.map { line =>
  val XArr=line.split(" ").map { num => num.toDouble } 
  Vectors.dense(XArr)
}

val testPrediction=model.predict(testRDD)

testPrediction.foreach{println}

}
}

数据

1,0 1
2,0 2
3,0 3
5,1 4
7,6 1
9,4 5
6,3 3

代码

package cn.tedu.sgd

object Driver {

def main(args: Array[String]): Unit = {

val conf=new SparkConf().setMaster("local").setAppName("sgd")

val sc=new SparkContext(conf)

val data=sc.textFile("c://data/ml/testSGD.txt")

//--为了满足建模需要,RDD[String:line]->RDD[LabeledPoint(Y,Vectors(X1,X2))] 
val r1=data.map { line =>
  val info=line.split(",")
  val Y=info(0).toDouble
  //--获取自变量数组
  val XArr=info(1).split(" ").map { num => num.toDouble }
  
  LabeledPoint(Y,Vectors.dense(XArr))

}

//--建立线性回归模型，底层是通过梯度下降法来解系数
//--①参:数据集  ②参:最大迭代次数  ③参:步长
val model=LinearRegressionWithSGD.train(r1, 10, 0.05)

//--获取自变量系数
//--Y=β1X1+β2X2 -> Y=0.98X1+1.0004X2
val coef=model.weights

//--回代原样本集预测，并返回结果
val prediction=model.predict(r1.map { labelpoint =>labelpoint.features })
prediction.foreach{println}

}
}

weixin_44617428

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
推荐算法案例

数据userId,itemId,score1 11 21 12 31 13 11 14 01 15 12 11 12 12 22 13 22 14 12 15 43 11 23 12 33 14 03 15 14 11 14 12 24 14 14 15 45 11 15 12 25 13 25 14 15 15 4代码package cn.te...
复制链接

扫一扫