手把手教你用TensorFlow Recommenders搭建电影推荐系统

最新推荐文章于 2025-04-22 22:53:12 发布

Coderabo

最新推荐文章于 2025-04-22 22:53:12 发布

阅读量379

点赞数 4

文章标签： tensorflow neo4j 人工智能

本文链接：https://blog.csdn.net/tombosky/article/details/147309105

版权

手把手教你用TensorFlow Recommenders搭建电影推荐系统（附完整实战代码）

一、TensorFlow Recommenders简介

TensorFlow Recommenders（TFRS）是谷歌官方推出的推荐系统专用库，它把推荐算法开发的门槛降到了新低。咱们做推荐系统最头疼的特征处理、模型训练、评估指标这些环节，TFRS都给封装成了现成的模块。举个例子，处理用户和物品的Embedding向量，传统方法要写几十行代码，现在两三行就能搞定。

这个库最大的亮点是跟TensorFlow生态无缝对接。比如数据预处理可以直接用TFX，部署可以用TensorFlow Serving，模型监控上TensorBoard，整个流程都能串起来。特别是对刚入门的小白特别友好，不需要从头造轮子就能快速搭建工业级推荐系统。

二、环境准备与安装

咱们先配好基础环境。推荐用Python3.8以上版本，避免版本兼容问题。安装命令很简单：

pip install tensorflow-recommenders
pip install tensorflow-datasets  # 示例数据集需要

检查是否安装成功：

import tensorflow_recommenders as tfrs
print(tfrs.__version__)  # 应该输出2.x.x

三、准备数据集

这里用MovieLens 100K公开数据集演示，包含10万条电影评分数据。用TFDS加载数据特别方便：

import tensorflow_datasets as tfds

ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")

# 只看用户ID、电影标题和评分
ratings = ratings.map(lambda x: {
    "user_id": x["user_id"],
    "movie_title": x["movie_title"],
    "user_rating": x["user_rating"]
})

movies = movies.map(lambda x: x["movie_title"])

四、构建推荐模型

咱们实现经典的双塔模型架构。用户塔处理用户特征，物品塔处理电影特征，最后计算两者的相似度。

步骤1 创建特征词典

user_ids = ratings.batch(1000).map(lambda x: x["user_id"])
movie_titles = ratings.batch(1000).map(lambda x: x["movie_title"])

user_ids_vocab = tf.keras.layers.StringLookup()
user_ids_vocab.adapt(user_ids)

movie_titles_vocab = tf.keras.layers.StringLookup()
movie_titles_vocab.adapt(movie_titles)

步骤2 定义双塔模型

class RecModel(tfrs.Model):
    
    def __init__(self):
        super().__init__()
        # 用户塔
        self.user_model = tf.keras.Sequential([
            tf.keras.layers.StringLookup(vocabulary=user_ids_vocab.get_vocabulary()),
            tf.keras.layers.Embedding(input_dim=user_ids_vocab.vocabulary_size(), 
                                     output_dim=32)
        ])
        
        # 电影塔
        self.movie_model = tf.keras.Sequential([
            tf.keras.layers.StringLookup(vocabulary=movie_titles_vocab.get_vocabulary()),
            tf.keras.layers.Embedding(input_dim=movie_titles_vocab.vocabulary_size(),
                                     output_dim=32)
        ])
        
        # 设置任务（这里用召回任务）
        self.task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies.batch(128).map(self.movie_model)
            )
        )
    
    def compute_loss(self, features, training=False):
        user_embeddings = self.user_model(features["user_id"])
        movie_embeddings = self.movie_model(features["movie_title"])
        
        return self.task(user_embeddings, movie_embeddings)

五、训练与评估

拆分训练集和测试集

tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42)
train_size = int(len(shuffled) * 0.8)
train = shuffled.take(train_size)
test = shuffled.skip(train_size).take(len(shuffled)-train_size)

配置训练参数

model = RecModel()
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

# 缓存加速训练
cached_train = train.shuffle(1000).batch(512).cache()
cached_test = test.batch(256).cache()

# 开始训练
model.fit(cached_train, epochs=3)

# 评估模型
model.evaluate(cached_test, return_dict=True)

六、实现推荐功能

训练好的模型可以这样生成推荐：

# 创建索引加速查询
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    movies.batch(100).map(lambda title: (title, model.movie_model(title)))
)

# 给用户ID为42的用户推荐Top3电影
user_id = "42"
_, titles = index(tf.constant([user_id]))
print(f"给用户{user_id}的推荐：{titles[0, :3]}")

七、模型导出与部署

导出为SavedModel格式方便部署：

tf.saved_model.save(index, "export",
    signatures={
        "recommend": index.call.get_concrete_function(
            tf.TensorSpec(shape=[None], dtype=tf.string, name="user_id"))
    }
)

用TensorFlow Serving启动服务：

docker run -p 8501:8501 --name recommender \
-v `pwd`/export:/models/recommender -e MODEL_NAME=recommender tensorflow/serving

八、完整实例代码

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

# 加载数据
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")

# 数据预处理
ratings = ratings.map(lambda x: {
    "user_id": x["user_id"],
    "movie_title": x["movie_title"]
})
movies = movies.map(lambda x: x["movie_title"])

# 创建特征词典
user_ids = ratings.batch(1000).map(lambda x: x["user_id"])
movie_titles = ratings.batch(1000).map(lambda x: x["movie_title"])

user_ids_vocab = tf.keras.layers.StringLookup()
user_ids_vocab.adapt(user_ids)

movie_titles_vocab = tf.keras.layers.StringLookup()
movie_titles_vocab.adapt(movie_titles)

# 定义模型
class RecModel(tfrs.Model):
    def __init__(self):
        super().__init__()
        self.user_model = tf.keras.Sequential([
            user_ids_vocab,
            tf.keras.layers.Embedding(user_ids_vocab.vocabulary_size(), 32)
        ])
        self.movie_model = tf.keras.Sequential([
            movie_titles_vocab,
            tf.keras.layers.Embedding(movie_titles_vocab.vocabulary_size(), 32)
        ])
        self.task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies.batch(128).map(self.movie_model)
            )
        )
    
    def compute_loss(self, features, training=False):
        user_emb = self.user_model(features["user_id"])
        movie_emb = self.movie_model(features["movie_title"])
        return self.task(user_emb, movie_emb)

# 训练配置
model = RecModel()
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

# 数据划分
train = ratings.take(80000)
test = ratings.skip(80000).take(20000)

cached_train = train.shuffle(10000).batch(512).cache()
cached_test = test.batch(512).cache()

# 模型训练
model.fit(cached_train, epochs=3)

# 评估
model.evaluate(cached_test, return_dict=True)

# 创建推荐索引
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    movies.batch(100).map(lambda title: (title, model.movie_model(title)))
)

# 生成推荐
user_id = "42"
_, titles = index(tf.constant([user_id]))
print(f"推荐结果：{titles[0, :5]}")

九、高级技巧

冷启动处理：用用户注册信息补全特征

user_model = tf.keras.Sequential([
    tf.keras.layers.Concatenate(axis=1),
    tf.keras.layers.Dense(32, activation="relu")
])

多目标优化：同时优化点击率和观看时长

self.rating_task = tfrs.tasks.Ranking(
    loss=tf.keras.losses.MeanSquaredError(),
    metrics=[tf.keras.metrics.RootMeanSquaredError()]
)

实时更新索引：每小时增量更新

index.reset()
index.index_from_dataset(
    updated_movies.batch(100).map(lambda title: (title, movie_model(title)))
)

通过这个完整案例，咱们把推荐系统的核心流程走了一遍。从数据准备到模型部署，每个环节都有现成的解决方案。建议大家在理解原理的基础上，多调整超参数试试效果，比如Embedding维度从32改到64，或者换用不同的优化器，观察指标变化。