【Spark】用隐式偏好进行训练（推荐系统）

最新推荐文章于 2024-05-21 16:50:40 发布

栗子ma

最新推荐文章于 2024-05-21 16:50:40 发布

阅读量2.5k

点赞数

分类专栏： Spark 推荐系统文章标签： Spark 推荐系统

Spark 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

推荐系统

2 篇文章 0 订阅

订阅专栏

Training with Implicit Preference (Recommendation)

用隐式偏好进行训练（推荐系统）

There are two types of user preferences:

explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users.
implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.

有两种用户偏好：

l 显式偏好（也称为“显式反馈”），比如用户对商品的“评分”

l 隐式偏好（也成为“隐式反馈”），比如“浏览”和“购买”历史记录

MLlib ALS provides the setImplicitPrefs() functionto set whether to use implicit preference. The ALS algorithm takes RDD[Rating]as training data input. The Rating class is defined in Spark MLlib library as:

MLlib协同过滤提供setImplicitPrefs()方法，可以设置是否使用隐式偏好。

协同过滤算法训练数据的输入是弹性数据集[评分]。Rating类在Spark MLliblibrary是这样定义的：

1	case class Rating(user: Int, product: Int, rating: Double)

By default, the recommendation templatesets setImplicitPrefs() to false whichexpects explicit rating values which the user has rated the item.

推荐模板setImplicitPrefs()的默认值是false，需要显式的评分数据。

To handle implicit preference, you canset setImplicitPrefs() to true. In this case, the"rating" value input to ALS is used to calculate the confidence levelthat the user likes the item. Higher "rating" means a strongerindication that the user likes the item.

如果是隐式偏好，你可以设置setImplicitPrefs()为true。这样，“评分”输入就会被用来计算用户是否喜欢商品的信心等级。更高的“评分”意味着用户更可能喜欢该商品。

The following provides an example ofusing implicit preference. You can find the complete modified source code here.

下面的案例提供了使用隐式偏好的例子。你可以在蓝色的here处找到完整的源代码。

Training with view events

用浏览行为进行训练

For example, if the more number of timesthe user has viewed the item, the higher confidence that the user likes theitem. We can aggregate the number of views and use this as the"rating" value.

比如，如果用户看该商品的次数越多，我们就越有信心该用户喜欢这个商品。我们可以累加浏览次数，然后用这个值作为“评分”数据。

First, we can modify DataSource.scala toaggregate the number of views of the user on the same item:

首先，我们可以改变DataSource.scala来累加用户浏览同一商品的次数：

def getRatings(sc:SparkContext):RDD[Rating]={//该方法输入context，输出评分

val eventsRDD:RDD[Event]=PEventStore.find(

appName = dsp.appName,

entityType =Some("user"),

eventNames =Some(List("view")),// MODIFIED

// targetEntityType is optional field of an event.

targetEntityType =Some(Some("item")))(sc)

val ratingsRDD:RDD[Rating]= eventsRDD.map { event =>

try{

val ratingValue:Double= event.event match{

case"view"=>1.0// MODIFIED

case_=>thrownewException(s"Unexpected event ${event} is read.")

}

// MODIFIED

// key is (user id, item id)

// value is the rating value, which is 1.

((event.entityId, event.targetEntityId.get), ratingValue)

}catch{

case e:Exception=>{

logger.error(s"Cannot convert ${event} to Rating. Exception: ${e}.")

throw e

}

// MODIFIED

// sum all values for the same user id and item id key

.reduceByKey {case(a, b)=> a + b }

.map {case((uid, iid), r)=>

Rating(uid, iid, r)

}.cache()

ratingsRDD

}

override

def readTraining(sc:SparkContext):TrainingData={

newTrainingData(getRatings(sc))

}

You may put the view count aggregationlogic in ALSAlgorithm's train() instead,depending on your needs.

Then, we can modify ALSAlgorithm.scalato set setImplicitPrefs to true:

classALSAlgorithm(val ap:ALSAlgorithmParams)

extendsPAlgorithm[PreparedData, ALSModel, Query, PredictedResult]{

...

def train(sc:SparkContext, data:PreparedData):ALSModel={

...

// If you only have one type of implicit event (Eg. "view" event only),

// set implicitPrefs to true

// MODIFIED

val implicitPrefs =true

val als =newALS()

als.setUserBlocks(-1)

als.setProductBlocks(-1)

als.setRank(ap.rank)

als.setIterations(ap.numIterations)

als.setLambda(ap.lambda)

als.setImplicitPrefs(implicitPrefs)

als.setAlpha(1.0)

als.setSeed(seed)

als.setCheckpointInterval(10)

val m = als.run(mllibRatings)

newALSModel(

rank = m.rank,

userFeatures = m.userFeatures,

productFeatures = m.productFeatures,

userStringIntMap = userStringIntMap,

itemStringIntMap = itemStringIntMap)

}

...

}

Now the recommendation engine can traina model with implicit preference events.

Next: Filter Recommended Items by Blacklist in Query

https://predictionio.apache.org/templates/recommendation/training-with-implicit-preference/

If the rating matrix is derived fromanother source of information (e.g., it is inferred from other signals), youcan use the trainImplicit method to get better results.

可以使用trainImplicit 方法：

val alpha =0.01

val lambda =0.01

val model =ALS.trainImplicit(ratings, rank, numIterations, lambda, alpha)

alpha - a constant used for computing confidence in implicitALS (default 1.0)一个常量，用于计算隐式ALS 中的confidence，默认值是1.0

https://spark.apache.org/docs/1.6.1/mllib-collaborative-filtering.html

栗子ma

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
【Spark】用隐式偏好进行训练（推荐系统）

Training with Implicit Preference (Recommendation)用隐式偏好进行训练（推荐系统）There are two types of user preferences:explicit preference (also referred as "explicit feedback"), such as "rating" given to i...
复制链接

扫一扫

专栏目录