DeepFM: A Factorization-Machine based Neural Network for CTR Prediction【论文记录】

最新推荐文章于 2022-05-28 23:12:27 发布

Novelin

最新推荐文章于 2022-05-28 23:12:27 发布

阅读量123

点赞数

分类专栏：推荐系统文章标签：联合学习共享Embedding FM为Embedding

本文链接：https://blog.csdn.net/qq_40860934/article/details/110489195

版权

推荐系统专栏收录该内容

10 篇文章 1 订阅

订阅专栏

1 摘要

DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning
DeepFM 结合了因子分解机的推荐能力和特征学习的深度学习功能

2 介绍

CNN-based models are biased to the interactions between neighboring features while RNN-based models are more suitable for click data with sequential dependency
基于 CNN 的模型偏向于相邻特征之间的交互，而基于 RNN 的模型更适合于具有顺序依赖的点击数据
PNN and FNN, like other deep models, capture little low-order feature interactions
PNN、FNN 与其他深层模型一样，捕获的低阶特征交互很少
DeepFM can be trained efficiently because its wide part and deep part, share the same input and also the embedding vector.
DeepFM 的宽组件和深组件共享同一输入和嵌入向量，因此可以有效地训练 DeepFM。

3 方法

DeepFM

FM component and deep component, that share the same input
FM 分量和 deep 分量，它们共享相同的输入

3.1 FM部分

In previous approaches, the parameter of an interaction of features $i$ and $j$ can be trained only when feature $i$ and feature $j$ both appear in the same data record.

While in FM, it is measured via the inner product of their latent vectors $V_i$ and $V_j$ . Thanks to this flexible design, FM can train latent vector $V_i (V_j)$ whenever $i$ (or $j$ ) appears in a data record
在之前的方法里，

FM Component

FM 的输出是一些内积单元和一个加单元构成
$y_{F M}=\langle w, x\rangle+\sum_{j_{1}=1}^{d} \sum_{j_{2}=j_{1}+1}^{d}\left\langle V_{i}, V_{j}\right\rangle x_{j_{1}} \cdot x_{j_{2}} \tag{1}$ 加单元代表 1 阶特征，内积单元代表 2 阶特征

3.2 Deep部分

DNN Component
embedding layer

We would like to point out the two interesting features of this network structure
我们想指出这个网络结构的两个有趣的特征
- while the lengths of different input field vectors can be different, their embeddings are of the same size (k)
  虽然不同输入的域向量的长度会不同，但它们的嵌入大小相同（k）
- the latent feature vectors (V) in FM now server as network weights which are learned and used to compress the input field vectors to the embedding vectors
  FM 中的潜在特征向量 V 现在作为网络权重，可以学习并用于将输入域向量压缩为嵌入向量
In¹, V is pre-trained by FM and used as initialization. we eliminate the need of pre-training by FM and instead jointly train the overall network in an end-to-end manner.
在¹，V 由 FM 预训练并用作初始化。我们消除了使用 FM 进行预训练的需要，而是以端到端的方式联合训练整个网络

4 相关工作

models

FNN: The FM pretraining strategy results in two limitations: 1) the embedding parameters might be over affected by FM; 2) the efficiency is reduced by the overhead introduced by the pre-training stage.
FM 预训练策略有两个局限性：1）嵌入参数可能会受到 FM 的过度影响； 2）预训练阶段引入的开销会降低效率。
PNN:we find that the outer product is less reliable than the inner product, since the approximated computation of outer product loses much information that makes the result unstable.
我们发现外积比内积不可靠，因为外积的近似计算丢失了很多信息使得结果不稳定

Although inner product is more reliable, it still suffers from high computational complexity, because the output of the product layer is connected to all neurons of the first hidden layer.
尽管内部乘积更可靠，但是由于乘积层的输出连接到第一隐藏层的所有神经元，因此它仍然遭受较高的计算复杂度。

Like FNN, all PNNs ignore low-order feature interactions.
像 FNN、PNN 忽略了低阶特征交互
Wide&Deep: there is a need for expertise feature engineering on the input to the “wide” part
在宽部分的输入需要专业的特征工程
The sharing strategy of feature embedding influences (in backpropagate manner) the feature representation by both low and high-order feature interactions, which models the representation more precisely.
特征嵌入的共享策略（以反向传播方式）影响低阶和高阶特征交互的特征表示，从而更精准地对特征表示建模

5 实验

5.1 时间花费

Time
纵轴为 $\frac{training \, time\, of\, deep\, CTR\, model}{training\, time\, of\, LR}$

IPNN、PNN* 花的时间很多

5.2 有效性

Perfomance

Learning high- and low-order feature interactions simultaneously while sharing the same feature embedding for high- and low-order feature interactions learning improves the performance of CTR prediction model.
同时学习高阶和低阶特征交互，同时共享相同的特征嵌入向量，学习可以提高 CTR 预测模型的性能。

6 总结

DeepFM trains a deep component and an FM component jointly. It gains performance improvement from these advantages:
- it does not need any pre-training;
- it learns both high- and low-order feature interactions;
- it introduces a sharing strategy of feature embedding to avoid feature engineering
One is exploring some strategies (such as introducing pooling layers) to strengthen the ability of learning most useful highorder feature interactions

Weinan Zhang, Tianming Du, and Jun Wang. Deep learning over multi-field categorical data - A case study on user response prediction. In ECIR, 2016 ↩︎ ↩︎

Novelin

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction【论文记录】

结合 FM 和 Deep 一起联合训练，共享输入和 embedding，embedding 是 FM 的潜在向量
复制链接

扫一扫