【Spark】用隐式偏好进行训练(推荐系统)

Training with Implicit Preference (Recommendation)

用隐式偏好进行训练(推荐系统)

There are two types of user preferences:

  • explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users.
  • implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.

有两种用户偏好:

显式偏好(也称为“显式反馈”),比如用户对商品的“评分”

隐式偏好(也成为“隐式反馈”),比如“浏览”和“购买”历史记录

MLlib ALS provides the setImplicitPrefs() functionto set whether to use implicit preference. The ALS algorithm takes RDD[Rating]as training data input. The Rating class is defined in Spark MLlib library as:

MLlib协同过滤提供setImplicitPrefs()方法,可以设置是否使用隐式偏好。

协同过滤算法训练数据的输入是弹性数据集[评分]Rating类在Spark MLliblibrary是这样定义的:

1

case class Rating(user: Int, product: Int, rating: Double)

By default, the recommendation templatesets setImplicitPrefs() to false whichexpects explicit rating values which the user has rated the item.

推荐模板setImplicitPrefs()的默认值是false,需要显式的评分数据。

To handle implicit preference, you canset setImplicitPrefs() to true. In this case, the"rating" value input to ALS is used to calculate the confidence levelthat the user likes the item. Higher "rating" means a strongerindication that the user likes the item.

如果是隐式偏好,你可以设置setImplicitPrefs()true。这样,“评分”输入就会被用来计算用户是否喜欢商品的信心等级。更高的“评分”意味着用户更可能喜欢该商品。

The following provides an example ofusing implicit preference. You can find the complete modified source code here.

下面的案例提供了使用隐式偏好的例子。你可以在蓝色的here处找到完整的源代码。

Training with view events

用浏览行为进行训练

For example, if the more number of timesthe user has viewed the item, the higher confidence that the user likes theitem. We can aggregate the number of views and use this as the"rating" value.

比如,如果用户看该商品的次数越多,我们就越有信心该用户喜欢这个商品。我们可以累加浏览次数,然后用这个值作为“评分”数据。

First, we can modify DataSource.scala toaggregate the number of views of the user on the same item:

首先,我们可以改变DataSource.scala来累加用户浏览同一商品的次数:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

  def getRatings(sc:SparkContext):RDD[Rating]={//该方法输入context,输出评分

 

    val eventsRDD:RDD[Event]=PEventStore.find(

      appName = dsp.appName,

      entityType =Some("user"),

      eventNames =Some(List("view")),// MODIFIED

      // targetEntityType is optional field of an event.

      targetEntityType =Some(Some("item")))(sc)

 

    val ratingsRDD:RDD[Rating]= eventsRDD.map { event =>

      try{

        val ratingValue:Double= event.event match{

          case"view"=>1.0// MODIFIED

          case_=>thrownewException(s"Unexpected event ${event} is read.")

        }

        // MODIFIED

        // key is (user id, item id)

        // value is the rating value, which is 1.

        ((event.entityId, event.targetEntityId.get), ratingValue)

      }catch{

        case e:Exception=>{

          logger.error(s"Cannot convert ${event} to Rating. Exception: ${e}.")

          throw e

        }

      }

    }

    // MODIFIED

    // sum all values for the same user id and item id key

    .reduceByKey {case(a, b)=> a + b }

    .map {case((uid, iid), r)=>

      Rating(uid, iid, r)

    }.cache()

 

    ratingsRDD

  }

 

  override

  def readTraining(sc:SparkContext):TrainingData={

    newTrainingData(getRatings(sc))

  }

You may put the view count aggregationlogic in ALSAlgorithm's train() instead,depending on your needs.

Then, we can modify ALSAlgorithm.scalato set setImplicitPrefs to true:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

classALSAlgorithm(val ap:ALSAlgorithmParams)

  extendsPAlgorithm[PreparedData, ALSModel, Query, PredictedResult]{

 

  ...

 

  def train(sc:SparkContext, data:PreparedData):ALSModel={

 

    ...

 

    // If you only have one type of implicit event (Eg. "view" event only),

    // set implicitPrefs to true

    // MODIFIED

    val implicitPrefs =true

    val als =newALS()

    als.setUserBlocks(-1)

    als.setProductBlocks(-1)

    als.setRank(ap.rank)

    als.setIterations(ap.numIterations)

    als.setLambda(ap.lambda)

    als.setImplicitPrefs(implicitPrefs)

    als.setAlpha(1.0)

    als.setSeed(seed)

    als.setCheckpointInterval(10)

    val m = als.run(mllibRatings)

 

    newALSModel(

      rank = m.rank,

      userFeatures = m.userFeatures,

      productFeatures = m.productFeatures,

      userStringIntMap = userStringIntMap,

      itemStringIntMap = itemStringIntMap)

  }

 

  ...

 

}

 

Now the recommendation engine can traina model with implicit preference events.

Next: Filter Recommended Items by Blacklist in Query

https://predictionio.apache.org/templates/recommendation/training-with-implicit-preference/

 

If the rating matrix is derived fromanother source of information (e.g., it is inferred from other signals), youcan use the trainImplicit method to get better results.

可以使用trainImplicit 方法:

val alpha =0.01

val lambda =0.01

val model =ALS.trainImplicit(ratings, rank, numIterations, lambda, alpha)

alpha - a constant used for computing confidence in implicitALS (default 1.0)一个常量,用于计算隐式ALS 中的confidence,默认值是1.0

https://spark.apache.org/docs/1.6.1/mllib-collaborative-filtering.html

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值