DeepFM因子分解机核心技术解析：从数学原理到工业级推荐系统实战

原创于 2025-02-20 11:29:54 发布 · 856 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#python #深度学习 #pytorch

Ai 专栏收录该内容

150 篇文章

订阅专栏

部署运行你感兴趣的模型镜像

一、技术原理与数学公式

1. FM（Factorization Machines）核心思想
公式：
$y^(x)=w0+∑i=1nwixi+∑i=1n∑j=i+1n⟨vi,vj⟩xixj\hat{y}(x) = w_0 + \sum_{i=1}^n w_i x_i + \sum_{i=1}^n \sum_{j=i+1}^n \langle v_i, v_j \rangle x_i x_j$

低秩分解：通过隐向量内积 $⟨vi,vj⟩\langle v_i, v_j \rangle$ 捕捉二阶特征交叉，参数复杂度从 $O(n^2)$ 降至 $O (kn)$ （ $k$ 为隐向量维度）
案例：电商场景中，用户ID与商品ID的交叉特征（如 user_123 × item_456）通过FM自动学习潜在关联

2. DeepFM架构

双路并行：FM（显式特征交叉） + DNN（隐式高阶交叉）
共享输入层：特征Embedding同时输入FM和DNN模块

二、实现方法（PyTorch代码片段）

import torch
import torch.nn as nn

class FM(nn.Module):
    def __init__(self, feature_size, k):
        super(FM, self).__init__()
        self.w0 = nn.Parameter(torch.zeros(1))
        self.w = nn.Embedding(feature_size, 1)  # 一阶权重
        self.v = nn.Embedding(fedingding(feature_size, k)  # 隐向量
  
    def forward(self, x):
        # 一阶项
        linear_term = torch.sum(self.w(x) * x.unsqueeze(2), dim=1)
        # 二阶项
        square_of_sum = torch.pow(torch.sum(self.v(x) * x.unsqueeze(2), dim=1), 2)
        sum_of_square = torch.sum(torch.pow(self.v(x) * x.unsqueeze(2), 2), dim=1)
        interaction = 0.5 * (square_of_sum - sum_of_square)
        return self.w0 + linear_term.squeeze() + interaction.squeeze()

# DeepFM完整模型
class DeepFM(nn.Module):
    def __init__(self, feature_size, k, hidden_units):
        super(DeepFM, self).__init__()
        self.fm = FM(feature_size, k)
        self.embedding = nn.Embedding(feature_size, k)
        self.dnn = nn.Sequential(
            nn.Linear(k*num_fields, hidden_units[0]),
            nn.ReLU(),
            nn.Linear(hidden_units[0], 1)
        )
  
    def forward(self, x):
        fm_output = self.fm(x)
        embedded = self.embedding(x).view(x.size(0), -1)
        dnn_output = self.dnn(embedded)
        return torch.sigmoid(fm_output + dnn_output)