企业级用户画像: 用户活跃度模型-RFE

本文链接：https://blog.csdn.net/weixin_43563705/article/details/109032678

絮叨两句:
博主是一名数据分析实习生,利用博客记录自己所学的知识,也希望能帮助到正在学习的同学们
人的一生中会遇到各种各样的困难和折磨，逃避是解决不了问题的，唯有以乐观的精神去迎接生活的挑战
少年易老学难成，一寸光阴不可轻。
最喜欢的一句话:今日事,今日毕

前期准备工作

企业级360°全方位用户画像:标签开发（前期准备工作）

RFE模型引入

RFE模型可以说是RFM模型的变体。 RFE模型基于用户的普通行为（非转化或交易行为）产生，它跟RFM类似都是使用三个维度做价值评估。

RFE详解

RFE 模型是根据会员

R（ Recency）:最近一次访问时间、
F（Frequency）:访问频率
E（Engagements）: 页面互动度
计算得出的RFE得分。其中：
最近一次访问时间 R（ Recency）：会员最近一次访问或到达网站的时间。
访问频率 F（ Frequency）：用户在特定时间周期内访问或到达的频率。
页面互动度 E（ Engagements）：互动度的定义可以根据不同企业或行业的交互情况而定，例如可以定义为页面浏览时间、浏览商品数量、视频播放数量、点赞数量、转发数量等。

在RFE模型中，由于不要求用户发生交易，因此可以做未发生登录、注册等匿名用户的行为价值分析，也可以做实名用户分析。该模型常用来做用户活跃分群或价值区分，也可用于内容型（例如论坛、新闻、资讯等）企业的会员分析。

RFM和 RFE模型的实现思路相同，仅仅是计算指标发生变化。对于RFE的数据来源，可以从企业自己监控的用户行为日志获取，也可以从第三方网站分析工具获得。

基于RFE模型的实践应用

在得到用户的RFE得分之后，跟 RFM 类似也可以有两种应用思路：

1：基于三个维度值做用户群体划分和解读，对用户的活跃度做分析。 RFE得分为 313 的会员说明其访问频率低，但是每次访问时的交互都非常不错，此时重点要做用户回访频率的提升，例如通过活动邀请、精准广告投放、会员活动推荐等提升回访频率。

2：基于RFE的汇总得分评估所有会员的活跃度价值，并可以做活跃度排名；同时，该得分还可以作为输入维度跟其他维度一起作为其他数据分析和挖掘模型的输入变量，为分析建模提供基础。
比如：

6忠诚（1天内访问2次及以上，每次访问页面不重复）
5活跃（2天内访问至少1次）
4回流（3天内访问至少1次）
3新增（注册并访问）
2不活跃（7天内未访问）
1流失（7天以上无访问）

需求分析

根据:

6忠诚（1天内访问2次及以上，每次访问页面不重复）
5活跃（2天内访问至少1次）
4回流（3天内访问至少1次）
3新增（注册并访问）
2不活跃（7天内未访问）
1流失（7天以上无访问）

在这里插入图片描述

活跃度:inType=HBase##zkHosts=192.168.10.20##zkPort=2181##hbaseTable=tbl_logs##family=detail##selectFields=global_user_id,loc_url,log_time

代码实现

package cn.itcast.userprofile.up24.newexcavate

import cn.itcast.userprofile.up24.public.PublicStaticCode
import org.apache.spark.sql.{DataFrame, SparkSession}

object RFE extends PublicStaticCode{
  override def SetAppName: String = "RFE"

  override def Four_Name: String = "活跃度"

  override def compilerAdapterFactory(spark: SparkSession, five: DataFrame, tblUser: DataFrame): DataFrame = {
    five.show()

    /**
     * +------+----+
     * |tagsId|rule|
     * +------+----+
     * |    46|   1|
     * |    47|   2|
     * |    48|   3|
     * |    49|   4|
     * +------+----+
     */
    tblUser.show()

    /**
     * loc_url            	当前地址
     * log_time           	访问时间
     * +--------------+--------------------+-------------------+
     * |global_user_id|             loc_url|           log_time|
     * +--------------+--------------------+-------------------+
     * |           424|http://m.eshop.co...|2019-08-13 03:03:55|
     * |           619|http://m.eshop.co...|2019-07-29 15:07:41|
     * |           898|http://m.eshop.co...|2019-08-14 09:23:44|
     * |           642|http://www.eshop....|2019-08-11 03:20:17|
     * |           130|http://www.eshop....|2019-08-12 11:59:28|
     * |           515|http://www.eshop....|2019-07-23 14:39:25|
     * |           274|http://www.eshop....|2019-07-24 15:37:12|
     * |           772|http://ck.eshop.c...|2019-07-24 07:56:49|
     * |           189|http://m.eshop.co...|2019-07-26 19:17:00|
     * |           529|http://m.eshop.co...|2019-07-25 23:18:37|
     * |           177|http://m.eshop.co...|2019-07-23 21:01:26|
     * |           247|http://m.vip.esho...|2019-07-22 06:58:05|
     * |           702| http://m.eshop.com/|2019-08-04 11:43:11|
     * |           871|http://vip.eshop....|2019-08-11 09:38:00|
     * |           349|http://www.eshop....|2019-07-22 12:17:54|
     * |           538|http://www.eshop....|2019-07-30 11:16:57|
     * |            81|http://www.eshop....|2019-08-06 09:10:37|
     * |           308|http://www.eshop....|2019-08-06 03:54:04|
     * |           344|http://m.eshop.co...|2019-08-09 07:25:10|
     * |           796|http://member.esh...|2019-08-07 15:16:28|
     * +--------------+--------------------+-------------------+
     */
    tblUser
  }

  def main(args: Array[String]): Unit = {
    startMain()

  }
}

package cn.itcast.userprofile.up24.newexcavate

import cn.itcast.userprofile.up24.public.PublicStaticCode
import org.apache.spark.ml.clustering.KMeans
import org.apache.spark.ml.feature.{MinMaxScaler, VectorAssembler}
import org.apache.spark.sql.{DataFrame, SparkSession}

import scala.collection.immutable
/**
 * Desc 用户活跃度模型-RFE
 * Recency:最近一次访问时间,用户最后一次访问距今时间
 * Frequency:访问频率,用户一段时间内访问的页面总数,
 * Engagements:页面互动度,用户一段时间内访问的独立页面数,也可以定义为页面 浏览量、下载量、 视频播放数量等
 */

object RFE extends PublicStaticCode{
  override def SetAppName: String = "RFE"

  override def Four_Name: String = "活跃度"
  /**
   * 开始计算
   * inType=HBase##zkHosts=192.168.10.20##zkPort=2181##
   * hbaseTable=tbl_logs##family=detail##selectFields=global_user_id,loc_url,log_time
   * @param five  MySQL中的5级规则 id,rule
   * @param tblUser 根据selectFields查询出来的HBase中的数据
   * @return userid,tagIds
   */

  override def compilerAdapterFactory(spark: SparkSession, five: DataFrame, tblUser: DataFrame): DataFrame = {
//    five.show()

    /**
     * +------+----+
     * |tagsId|rule|
     * +------+----+
     * |    46|   1|
     * |    47|   2|
     * |    48|   3|
     * |    49|   4|
     * +------+----+
     */
//    tblUser.show()

    /**
     * loc_url            	当前地址
     * log_time           	访问时间
     * +--------------+--------------------+-------------------+
     * |global_user_id|             loc_url|           log_time|
     * +--------------+--------------------+-------------------+
     * |           424|http://m.eshop.co...|2019-08-13 03:03:55|
     * |           619|http://m.eshop.co...|2019-07-29 15:07:41|
     * |           898|http://m.eshop.co...|2019-08-14 09:23:44|
     * |           642|http://www.eshop....|2019-08-11 03:20:17|
     * |           130|http://www.eshop....|2019-08-12 11:59:28|
     * |           515|http://www.eshop....|2019-07-23 14:39:25|
     * |           274|http://www.eshop....|2019-07-24 15:37:12|
     * |           772|http://ck.eshop.c...|2019-07-24 07:56:49|
     * |           189|http://m.eshop.co...|2019-07-26 19:17:00|
     * |           529|http://m.eshop.co...|2019-07-25 23:18:37|
     * |           177|http://m.eshop.co...|2019-07-23 21:01:26|
     * |           247|http://m.vip.esho...|2019-07-22 06:58:05|
     * |           702| http://m.eshop.com/|2019-08-04 11:43:11|
     * |           871|http://vip.eshop....|2019-08-11 09:38:00|
     * |           349|http://www.eshop....|2019-07-22 12:17:54|
     * |           538|http://www.eshop....|2019-07-30 11:16:57|
     * |            81|http://www.eshop....|2019-08-06 09:10:37|
     * |           308|http://www.eshop....|2019-08-06 03:54:04|
     * |           344|http://m.eshop.co...|2019-08-09 07:25:10|
     * |           796|http://member.esh...|2019-08-07 15:16:28|
     * +--------------+--------------------+-------------------+
     */
    import spark.implicits._
    import scala.collection.JavaConversions._
    import org.apache.spark.sql.functions._
    //0.定义常量字符串,避免后续拼写错误
    val recencyStr = "recency"
    val frequencyStr = "frequency"
    val engagementsStr = "engagements"
    val featureStr = "feature"
    val scaleFeatureStr = "scaleFeature"
    val predictStr = "predict"
    //1.按用户id进行聚合获取用户活跃度-RFE
    //Recency:最近一次访问时间,用户最后一次访问距今时间,当前时间 - log_time
    //Frequency:访问频率,用户一段时间内访问的页面总数,count(loc_url)
    //Engagements:页面互动度,用户一段时间内访问的独立页面数,也可以定义为页面 浏览量、下载量、 视频播放数量等,distinct count(loc_url)
    var recencyAggColumn=datediff(date_sub(current_timestamp(),361),max("log_time")) as(recencyStr)
    var frequencyAggColumn=count("loc_url") as(frequencyStr)
    var engagementsAggCOlumn=countDistinct("loc_url") as engagementsStr

    val rfe_result = tblUser.groupBy("global_user_id").agg(recencyAggColumn, frequencyAggColumn, engagementsAggCOlumn)
//    rfe_result.show()
    /**
     * +--------------+-------+---------+-----------+
     * |global_user_id|recency|frequency|engagements|
     * +--------------+-------+---------+-----------+
     * |           296|     61|      380|        227|
     * |           467|     61|      405|        267|
     * |           675|     61|      370|        240|
     * |           691|     61|      387|        244|
     * |           829|     61|      404|        269|
     * |           125|     61|      375|        246|
     * |           451|     61|      347|        224|
     * +--------------+-------+---------+-----------+
     */
    // R:0-15天=5分，16-30天=4分，31-45天=3分，46-60天=2分，大于61天=1分
    // F:≥400=5分，300-399=4分，200-299=3分，100-199=2分，≤99=1分
    // E:≥250=5分，230-249=4分，210-229=3分，200-209=2分，1=1分
    var recencyScore=when(col(recencyStr).between(0,15),5)
      .when(col(recencyStr).between(16,30),4)
      .when(col(recencyStr).between(31,45),3)
      .when(col(recencyStr).between(46,60),2)
      .when(col(recencyStr).geq(61),1)
      .as(recencyStr)
    var frequencyScore= when(col(frequencyStr).geq(400),5)
      .when(col(frequencyStr).between(300,399),4)
      .when(col(frequencyStr).between(200,299),4)
      .when(col(frequencyStr).between(100,199),4)
      .when(col(frequencyStr).leq(99),1)
      .as(frequencyStr)
    var engagementsScore=when(col(engagementsStr).geq(250),5)
      .when(col(engagementsStr).between(230,249),4)
      .when(col(engagementsStr).between(210,229),3)
      .when(col(engagementsStr).between(200,209),2)
      .when(col(engagementsStr).leq(1),1)
      .as(engagementsStr)

    val ref_Score_Result = rfe_result.select('global_user_id, recencyScore, frequencyScore, engagementsScore)
//    ref_Score_Result.show()

    /**
     * +--------------+-------+---------+-----------+
     * |global_user_id|recency|frequency|engagements|
     * +--------------+-------+---------+-----------+
     * |           296|      1|        4|          3|
     * |           467|      1|        5|          5|
     * |           675|      1|        4|          4|
     * |           691|      1|        4|          4|
     * |           829|      1|        5|          5|
     * |           125|      1|        4|          4|
     * |           451|      1|        4|          3|
     * |           800|      1|        4|          4|
     * |           853|      1|        4|          5|
     */
    //3.聚类
    //为方便后续模型进行特征输入，需要部分列的数据转换为特征向量，并统一命名，VectorAssembler类就可以完成这一任务。
    //VectorAssembler是一个transformer，将多列数据转化为单列的向量列
    val vectorDF: DataFrame = new VectorAssembler()
      .setInputCols(Array(recencyStr, frequencyStr, engagementsStr))
      .setOutputCol(featureStr)
      .transform(ref_Score_Result)

//    vectorDF.show()

    /**
     * +--------------+-------+---------+-----------+-------------+
     * |global_user_id|recency|frequency|engagements|      feature|
     * +--------------+-------+---------+-----------+-------------+
     * |           296|      1|        4|          3|[1.0,4.0,3.0]|
     * |           467|      1|        5|          5|[1.0,5.0,5.0]|
     * |           675|      1|        4|          4|[1.0,4.0,4.0]|
     * |           691|      1|        4|          4|[1.0,4.0,4.0]|
     * |           829|      1|        5|          5|[1.0,5.0,5.0]|
     * |           125|      1|        4|          4|[1.0,4.0,4.0]|
     * |           451|      1|        4|          3|[1.0,4.0,3.0]|
     * |           800|      1|        4|          4|[1.0,4.0,4.0]|
     * |           853|      1|        4|          5|[1.0,4.0,5.0]|
     * |           944|      1|        4|          5|[1.0,4.0,5.0]|
     * |           666|      1|        5|          5|[1.0,5.0,5.0]|
     * |           870|      1|        5|          5|[1.0,5.0,5.0]|
     * |           919|      1|        5|          5|[1.0,5.0,5.0]|
     */
 /*   //最小最大归一化将每个特征调整到一个特定的范围,通常是（0,1）
    //(X - X.min)/(X.max - X.min)
    //归一化数据可以使各个特征维度对目标函数的影响权重一致，提高迭代的求解的收敛速度
    val scalerModel = new MinMaxScaler()
      .setInputCol(featureStr)
      .setOutputCol(scaleFeatureStr)
      .fit(vectorDF)
    val scalerDF = scalerModel.transform(vectorDF)*/
//    scalerDF.show()

    /**
     * +--------------+-------+---------+-----------+-------------+-------------+
     * |global_user_id|recency|frequency|engagements|      feature| scaleFeature|
     * +--------------+-------+---------+-----------+-------------+-------------+
     * |           296|      1|        4|          3|[1.0,4.0,3.0]|[0.5,0.0,0.0]|
     * |           467|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           675|      1|        4|          4|[1.0,4.0,4.0]|[0.5,0.0,0.5]|
     * |           691|      1|        4|          4|[1.0,4.0,4.0]|[0.5,0.0,0.5]|
     * |           829|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           125|      1|        4|          4|[1.0,4.0,4.0]|[0.5,0.0,0.5]|
     * |           451|      1|        4|          3|[1.0,4.0,3.0]|[0.5,0.0,0.0]|
     * |           800|      1|        4|          4|[1.0,4.0,4.0]|[0.5,0.0,0.5]|
     * |           853|      1|        4|          5|[1.0,4.0,5.0]|[0.5,0.0,1.0]|
     * |           944|      1|        4|          5|[1.0,4.0,5.0]|[0.5,0.0,1.0]|
     * |           666|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           870|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           919|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           926|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           124|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           447|      1|        4|          5|[1.0,4.0,5.0]|[0.5,0.0,1.0]|
     * |            51|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           591|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |             7|      1|        5|          5|[1.0,5.0,5.0]|[0.5,1.0,1.0]|
     * |           307|      1|        4|          3|[1.0,4.0,3.0]|[0.5,0.0,0.0]|
     * +--------------+-------+---------+-----------+-------------+-------------+
     */
    //4.训练模型
    val model = new KMeans()
      .setK(4)
      .setMaxIter(10)
      .setSeed(10)
      .setFeaturesCol(featureStr)
      .setPredictionCol(predictStr)
      .fit(vectorDF)
    //5.预测
    val result: DataFrame = model.transform(vectorDF)
//    result.show()

    /**
     * +--------------+-------+---------+-----------+-------------+-------+
     * |global_user_id|recency|frequency|engagements|      feature|predict|
     * +--------------+-------+---------+-----------+-------------+-------+
     * |           296|      1|        4|          3|[1.0,4.0,3.0]|      3|
     * |           467|      1|        5|          5|[1.0,5.0,5.0]|      0|
     * |           675|      1|        4|          4|[1.0,4.0,4.0]|      1|
     * |           691|      1|        4|          4|[1.0,4.0,4.0]|      1|
     * |           829|      1|        5|          5|[1.0,5.0,5.0]|      0|
     * |           125|      1|        4|          4|[1.0,4.0,4.0]|      1|
     * |           451|      1|        4|          3|[1.0,4.0,3.0]|      3|
     * |           800|      1|        4|          4|[1.0,4.0,4.0]|      1|
     * |           853|      1|        4|          5|[1.0,4.0,5.0]|      2|
     * |           944|      1|        4|          5|[1.0,4.0,5.0]|      2|
     */

    //6.测试时看下聚类效果
    val ds = result
      .groupBy(predictStr)
      .agg(max(col(recencyStr) + col(frequencyStr) + col(engagementsStr)), min(col(recencyStr) + col(frequencyStr) + col(engagementsStr)))
      .sort(col(predictStr).asc)
//    ds.show()

    /**
     * +-------+------------------------------------------+------------------------------------------+
     * |predict|max(((recency + frequency) + engagements))|min(((recency + frequency) + engagements))|
     * +-------+------------------------------------------+------------------------------------------+
     * |      0|                                        11|                                        10|
     * |      1|                                         9|                                         9|
     * |      2|                                        10|                                        10|
     * |      3|                                         8|                                         8|
     * +-------+------------------------------------------+------------------------------------------+
     */

    //问题: 每一个簇的ID是无序的,但是我们将分类簇和rule进行对应的时候,需要有序
    //7.按质心排序,质心大,该类用户价值大
    //[(质心id, 质心值)]
    val center: immutable.IndexedSeq[(Int, Double)] = for (i <- model.clusterCenters.indices) yield (i, model.clusterCenters(i).toArray.sum)
//    println(center)
    /**
     * Vector((0,10.935802469135801), (1,9.0), (2,10.0), (3,8.0))
     */
    val centerSort: immutable.IndexedSeq[(Int, Double)] = center.sortBy(_._2).reverse
//    println(centerSort)
    /**
     * Vector((0,10.935802469135801), (2,10.0), (1,9.0), (3,8.0))
     */
    //[(质心id, rule值)]
    val centerAndRule: immutable.Seq[(Int, Int)] = for (i <- centerSort.indices) yield (centerSort(i)._1, i + 1)
//    println(centerAndRule)
    /**
     * Vector((0,1), (2,2), (1,3), (3,4))
     */
    val centerDF: DataFrame = centerAndRule.toDF(predictStr, "rule")
//    centerDF.show()
    /**
     * +-------+----+
     * |predict|rule|
     * +-------+----+
     * |      0|   1|
     * |      2|   2|
     * |      1|   3|
     * |      3|   4|
     * +-------+----+
     */
    //8.将rule和5级规则进行匹配
    val ruleTagDF: DataFrame = centerDF.join(five, "rule")
//    ruleTagDF.show()
    /**
     * +----+-------+------+
     * |rule|predict|tagsId|
     * +----+-------+------+
     * |   1|      0|    46|
     * |   2|      2|    47|
     * |   3|      1|    48|
     * |   4|      3|    49|
     * +----+-------+------+
     */
    //将第八步得出的结果,与K-Means预测的结果进行匹配
    val ruleMap: Map[String, String] = ruleTagDF.map(row => {
      val predict = row.getAs("predict").toString
      val tagsId = row.getAs("tagsId").toString
      (predict, tagsId)
    }).collect().toMap

    var predict_UDF=udf((predict:String)=>{
      val tag = ruleMap(predict)
      tag
    })

    val new_tag = result.select('global_user_id as ("userId"), predict_UDF('predict).as("tagsId"))
//    new_tag.show()
    /**
     * +------+------+
     * |userId|tagsId|
     * +------+------+
     * |   296|    49|
     * |   467|    46|
     * |   675|    48|
     * |   691|    48|
     * |   829|    46|
     * |   125|    48|
     * |   451|    49|
     * |   800|    48|
     * |   853|    47|
     * |   944|    47|
     * |   666|    46|
     * |   870|    46|
     * |   919|    46|
     * |   926|    46|
     * |   124|    46|
     * |   447|    47|
     * |    51|    46|
     * |   591|    46|
     * |     7|    46|
     * |   307|    49|
     * +------+------+
     */
    new_tag


  }

  def main(args: Array[String]): Unit = {
    startMain()

  }
}