IDEA+MR实现ALS

1.环境
导入spark-1.4.1-bin-hadoop2.6压缩包lib目录下的spark-assembly-1.4.1-hadoop2.6.0


2.IDEA代码
package demo


import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.mllib.recommendation._
/**
 * Created by tipdm101 on 2016/12/7.
 */
object ALSTrainer {
  def main(args:Array[String])={
    if(args.length!=3){
      println("Usage:demo.ALSTrainer <input> <output> <rank> <iteration> <lambda>")
      System.exit(1)
    }
    val input = args(0)
    val output =args(1)
    val rank = args(2).toInt
    val iteration = args(3).toInt
    val lambda = args(4).toDouble


//    //初始化SparkContext
    val sc = new SparkContext(new SparkConf().setAppName("ALS Model Trainer"))
//    //数据加载并分割
    val original = sc.textFile(input).map{x => val f = x.split("::");(f(3).toInt,(f(0),f(1),f(2)))}.sortByKey()

    val splitNum = (original.count * 0.05).toInt
    val splitTimeStamp = original.take(splitNum).toList.last._1
    val train = original.filter(x => x._1 > splitTimeStamp).map(x => Rating(x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
    val test = original.filter(x => x._1 <= splitTimeStamp).map(x => (x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
//    //建立模型
    val model = ALS.train(train,rank,iteration,lambda)
    def computeRMSE(model:MatrixFactorizationModel,test:org.apache.spark.rdd.RDD[(Int,Int,Double)]):Double =
    {Math.sqrt(model.predict(test.map(x =>(x._1,x._2)))
      .map(x => ((x.user,x.product),x.rating)).
      join(test.map(x =>((x._1,x._2),x._3))).
      map(x =>(x._2._1-x._2._2)*(x._2._1-x._2._2)).sum/test.count)}
    val rmse = computeRMSE(model,test)
//
//
     model.save(sc,output + "/model")
    sc.parallelize(List(rmse),1).saveAsTextFile(output + "/rmse")
    sc.stop()
  }
}


3.打jar包,右上角点击IDEA图标进入,点击Artifacts,新建als.jar 只加入'als' compile output,OK退出
主页面上方找到Build,Build Artifacts,出现als-选build,
out文件夹中找到als.jar,右键show in explorer


4.从show in explorer打开的目录中将als.jar拖入shell的/opt目录下


5.只打开hadoop集群,用MR实现ALS(spark集群用不到)


./spark-submit --master yarn --class demo.ALSTrainer /opt/als.jar /root/ratings.dat /root/als_output 10 10 0.01









  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
ALS(Alternating Least Squares)是一种常用的协同过滤算法,可以用于推荐系统的建设。在Java Spring Boot中,可以通过使用Apache Spark来实现ALS模型。以下是实现步骤: 1. 导入Apache Spark的Maven依赖: ```xml <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artifactId> <version>3.0.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.12</artifactId> <version>3.0.1</version> </dependency> ``` 2. 加载数据集: ```java JavaRDD<Rating> ratingsRDD = sc.textFile("path/to/ratings.csv") .map(line -> { String[] fields = line.split(","); int userId = Integer.parseInt(fields[0]); int movieId = Integer.parseInt(fields[1]); double rating = Double.parseDouble(fields[2]); return new Rating(userId, movieId, rating); }); ``` 3. 划分训练集和测试集: ```java JavaRDD<Rating>[] splits = ratingsRDD.randomSplit(new double[]{0.8, 0.2}, 0L); JavaRDD<Rating> trainingRDD = splits[0].cache(); JavaRDD<Rating> testRDD = splits[1].cache(); ``` 4. 训练ALS模型: ```java ALS als = new ALS() .setRank(10) .setMaxIter(10) .setRegParam(0.01) .setUserCol("userId") .setItemCol("movieId") .setRatingCol("rating"); MatrixFactorizationModel model = als.fit(trainingRDD); ``` 5. 使用模型进行预测: ```java JavaRDD<Tuple2<Object, Object>> userProducts = testRDD.map(r -> new Tuple2<>(r.user(), r.product())); JavaPairRDD<Tuple2<Integer, Integer>, Double> predictions = JavaPairRDD.fromJavaRDD( model.predict(JavaRDD.toRDD(userProducts)).toJavaRDD() .map(r -> new Tuple2<>(new Tuple2<>(r.user(), r.product()), r.rating())) ); ``` 以上就是Java Spring Boot中实现ALS模型的基本步骤。需要注意的是,在实际应用中还需要进行模型评估、调参等工作,以提高模型的准确性和可靠性。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值