DeepFM: A Factorization-Machine based Neural Network for CTR Prediction【论文记录】

1 摘要

  • DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning
    DeepFM 结合了因子分解机的推荐能力和特征学习的深度学习功能

2 介绍

  • CNN-based models are biased to the interactions between neighboring features while RNN-based models are more suitable for click data with sequential dependency
    基于 CNN 的模型偏向于相邻特征之间的交互,而基于 RNN 的模型更适合于具有顺序依赖的点击数据

  • PNN and FNN, like other deep models, capture little low-order feature interactions
    PNN、FNN 与其他深层模型一样,捕获的低阶特征交互很少

  • DeepFM can be trained efficiently because its wide part and deep part, share the same input and also the embedding vector.
    DeepFM 的宽组件和深组件共享同一输入和嵌入向量,因此可以有效地训练 DeepFM。

3 方法

DeepFM

  • FM component and deep component, that share the same input
    FM 分量和 deep 分量,它们共享相同的输入

3.1 FM部分

  • In previous approaches, the parameter of an interaction of features i i i and j j j can be trained only when feature i i i and feature j j j both appear in the same data record.

    While in FM, it is measured via the inner product of their latent vectors V i V_i Vi and V j V_j Vj. Thanks to this flexible design, FM can train latent vector V i ( V j ) V_i (V_j) Vi(Vj) whenever i i i (or j j j) appears in a data record
    在之前的方法里,

FM Component

  • FM 的输出是一些内积单元和一个加单元构成
    y F M = ⟨ w , x ⟩ + ∑ j 1 = 1 d ∑ j 2 = j 1 + 1 d ⟨ V i , V j ⟩ x j 1 ⋅ x j 2 (1) y_{F M}=\langle w, x\rangle+\sum_{j_{1}=1}^{d} \sum_{j_{2}=j_{1}+1}^{d}\left\langle V_{i}, V_{j}\right\rangle x_{j_{1}} \cdot x_{j_{2}} \tag{1} yFM=w,x+j1=1dj2=j1+1dVi,Vjxj1xj2(1) 加单元代表 1 阶特征,内积单元代表 2 阶特征

3.2 Deep部分

DNN Component
embedding layer

  • We would like to point out the two interesting features of this network structure
    我们想指出这个网络结构的两个有趣的特征

    • while the lengths of different input field vectors can be different, their embeddings are of the same size (k)
      虽然不同输入的域向量的长度会不同,但​​它们的嵌入大小相同(k)

    • the latent feature vectors (V) in FM now server as network weights which are learned and used to compress the input field vectors to the embedding vectors
      FM 中的潜在特征向量 V 现在作为网络权重,可以学习并用于将输入域向量压缩为嵌入向量

  • In1, V is pre-trained by FM and used as initialization. we eliminate the need of pre-training by FM and instead jointly train the overall network in an end-to-end manner.
    1,V 由 FM 预训练并用作初始化。我们消除了使用 FM 进行预训练的需要,而是以端到端的方式联合训练整个网络

4 相关工作

models

  • FNN: The FM pretraining strategy results in two limitations: 1) the embedding parameters might be over affected by FM; 2) the efficiency is reduced by the overhead introduced by the pre-training stage.
    FM 预训练策略有两个局限性:1)嵌入参数可能会受到 FM 的过度影响; 2)预训练阶段引入的开销会降低效率。

  • PNN:we find that the outer product is less reliable than the inner product, since the approximated computation of outer product loses much information that makes the result unstable.
    我们发现外积比内积不可靠,因为外积的近似计算丢失了很多信息使得结果不稳定

    Although inner product is more reliable, it still suffers from high computational complexity, because the output of the product layer is connected to all neurons of the first hidden layer.
    尽管内部乘积更可靠,但是由于乘积层的输出连接到第一隐藏层的所有神经元,因此它仍然遭受较高的计算复杂度。

    Like FNN, all PNNs ignore low-order feature interactions.
    像 FNN、PNN 忽略了低阶特征交互

  • Wide&Deep: there is a need for expertise feature engineering on the input to the “wide” part
    在宽部分的输入需要专业的特征工程

  • The sharing strategy of feature embedding influences (in backpropagate manner) the feature representation by both low and high-order feature interactions, which models the representation more precisely.
    特征嵌入的共享策略(以反向传播方式)影响低阶和高阶特征交互的特征表示,从而更精准地对特征表示建模

5 实验

5.1 时间花费

Time
纵轴为 t r a i n i n g   t i m e   o f   d e e p   C T R   m o d e l t r a i n i n g   t i m e   o f   L R \frac{training \, time\, of\, deep\, CTR\, model}{training\, time\, of\, LR} trainingtimeofLRtrainingtimeofdeepCTRmodel

  • IPNN、PNN* 花的时间很多

5.2 有效性

Perfomance

  • Learning high- and low-order feature interactions simultaneously while sharing the same feature embedding for high- and low-order feature interactions learning improves the performance of CTR prediction model.
    同时学习高阶和低阶特征交互,同时共享相同的特征嵌入向量,学习可以提高 CTR 预测模型的性能。

6 总结

  • DeepFM trains a deep component and an FM component jointly. It gains performance improvement from these advantages:

    • it does not need any pre-training;
    • it learns both high- and low-order feature interactions;
    • it introduces a sharing strategy of feature embedding to avoid feature engineering
  • One is exploring some strategies (such as introducing pooling layers) to strengthen the ability of learning most useful highorder feature interactions


  1. Weinan Zhang, Tianming Du, and Jun Wang. Deep learning over multi-field categorical data - A case study on user response prediction. In ECIR, 2016 ↩︎ ↩︎

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值