推荐系统(工程方向)-统一召回平台

最新推荐文章于 2024-08-20 21:17:50 发布

学无止境-逆流而上

最新推荐文章于 2024-08-20 21:17:50 发布

阅读量1.9k

点赞数

文章标签： java 大数据 spring 数据库 hadoop

本文链接：https://blog.csdn.net/ITbasketplayer/article/details/122682629

版权

一、背景

在推荐系统中，召回是非常重要的一步，尽可能召回所有相关的结果，适当召回具有探索意义的结果，是决定推荐效果上限的一环。

召回可以模块化：

1、u2i,k2i,i2i,v2i(统称x2i)

2、index召回(基于倒排，lucene、solr、es、redis等实现)

3、新热召回(排行榜)

...

我们其实在

https://zhuanlan.zhihu.com/p/355510794

已经做了这些模块化，为什么还要一个统一召回平台呢？

策略平台虽然可以整合召回模块，但它也是对外出口服务，提供给业务获取推荐结果，是一级服务，只能由工程同学操作，不能让算法同学操作。我们期望召回平台，是一个完全可由算法同学操作的平台，即使操作有误，策略平台也有兜底策略，不会引起推荐故障。
策略平台有各种丰富的模块，但对召回而言，不够纯粹，使用也不够简洁，不够规范，接入有一定成本。
统一召回平台，把用户画像、过滤、画像预处理、召回、排序、重排等组件集成在一起，通过流水线的配置方式(非DAG方式)，大大简化了操作，算法同学很容易使用。

二、实现

1、定义召回相关组件，并使用json组织起来

核心组件说明：

globals：全局域，目前只配置过滤组件(包括历史过滤、属性过滤，黑白名单过滤)。全局生效的组件，目前只有过滤组件。
define：定义域，目前有用户画像，画像预处理，召回组、召回、排序组、排序、合并、去重等，定义域是先定义，后续组件可以ref引用，引用的组件只能在define定义。
contexts：上下文域，目前只配置用户画像(p_profile,u_profile)、请求上下文query_context。
main_recall：主召回，可直接ref引用define里的recalls，可以再包含多个召回组、排序组、召回、排序、物品属性、过滤组、合并等。
f_recall：类似主召回，主召回数量不够时调用。
deduplications：去重
collector：打散&多样性

简单json配置样例：

{
  "globals": [
    {
      "name": "f1",
      "plugin_type": "filter_attr",
      "join": "or",
      "attrs": [
        {
          "name": "v1",
          "value": "1",
          "relation": "="
        }
      ]
    }
  ],
  "define": [
    {
      "ranks": [
        {
          "model_version": "v1",
          "model_name": "ttt",
          "feature": "sex,up,60_p",
          "u_features": {
            "ref": "pro1",
            "feature_names": "a1,a2"
          },
          "name": "r1",
          "plugin_type": "remote_rank",
          "m_type": "simple_rank"
        }
      ],
      "name": "rank_group_1",
      "plugin_type": "rank_group",
      "cal_score": {
        "ref": "expr_score"
      }
    },
    {
      "recalls": [
        {
          "top": 100,
          "seed": {
            "ref": "pro1",
            "seed_attr": "30_plylist"
          },
          "name": "recall1",
          "plugin_type": "x2i",
          "table_name": "test_x2i_1"
        }
      ],
      "item_attr": {
        "table_name": "test_attr",
        "attr_names": "a1,a2,a3"
      },
      "name": "recall1",
      "plugin_type": "recall_group",
      "rank": {
        "ref": "rank_group_1"
      }
    },
    {
      "cal_recall_score": false,
      "name": "expr_score",
      "plugin_type": "expr_score",
      "expr": "s1*s2*s3",
      "item_score_field": "p1,p2,p3"
    }
  ],
  "contexts": [
    {
      "d_type": "string",
      "name": "query1",
      "plugin_type": "query_context",
      "data_path": "query.pageNo"
    },
    {
      "name": "pro1",
      "plugin_type": "u_profile"
    }
  ],
  "main_recall": {
    "ref": "recall1",
    "name": "main_recall",
    "plugin_type": "main_recall"
  },
  "collector": {
    "top": 100,
    "c_num": 2,
    "name": "c1",
    "plugin_type": "col_default",
    "attrs": [
      {
        "name": "n1",
        "per": 9,
        "pros": [
          {
            "value": "a1",
            "per": 5
          },
          {
            "value": "a2",
            "per": 3
          }
        ]
      }
    ]
  }
}