DeepFM 网络介绍与源码剖析
前言
好吧, 我承认, 本周末我的精力确实有些过剩; 如果写完这篇, 那么就肝完了 3 篇博客啦 (打广告: FNN 网络介绍与源码浅析 与 NFM 网络介绍与源码浅析, 哈哈哈 🤣 🤣 🤣; duang~~~. 另外, 主要原因是本周看的 Paper 也比较多, 希望能及时将相关感受想法记录下来, 后面回忆起来核心观点也会很快. 😎
广而告之
可以在微信中搜索 “珍妮的算法之路” 或者 “world4458” 关注我的微信公众号;另外可以看看知乎专栏 PoorMemory-机器学习, 以后文章也会发在知乎专栏中;
DeepFM
文章信息
- 论文标题: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
- 论文地址: https://arxiv.org/abs/1703.04247
- 代码地址: https://github.com/ChenglongChen/tensorflow-DeepFM
- 发表时间: 2017
- 论文作者: Huifeng Guo, Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He
- 作者单位: Harbin Institute of Technology, Huawei
核心观点
DeepFM 采用的是和 Wide & Deep 相同的架构, 相当于将 Wide 部分替换为 FM, 这样可以自动学习一阶和二阶交叉特征, 这样无需人工去设计交叉特征, 即无需人工特征工程了. 另外 Deep 层可以学习到高阶的交叉特征; 网络端到端训练, 可以同时学习低阶和高阶的交叉特征.
核心观点介绍
DeepFM 的网络结构如下图所示,
它主要包含两个部分, FM 部分用于学习低阶的交叉特征, 而 Deep 部分用于高阶交叉特征的学习, 模型的输出为:
y ^ = sigmoid ( y F M + y D N N ) \hat{y}=\operatorname{sigmoid}\left(y_{F M}+y_{D N N}\right) y^=sigmoid(yFM+yDNN)
其中 FM 部分 (关于 FM 部分, 可以参考: FM 算法介绍以及 libFM 源码简析) 公式化如下:
y F M = ⟨ w , x ⟩ + ∑ j 1 = 1 d ∑ j 2 = j 1 + 1 d ⟨ V i , V j ⟩ x j 1 ⋅ x j 2 y_{F M}=\langle w, x\rangle+\sum_{j_{1}=1}^{d} \sum_{j_{2}=j_{1}+1}^{d}\left\langle V_{i}, V_{j}\right\rangle x_{j_{1}} \cdot x_{j_{2}} yFM=⟨w,x⟩+j1=1∑dj2=j1+1∑d⟨Vi,Vj⟩xj1⋅xj2
其中 ⟨ w , x ⟩ \langle w, x\rangle ⟨w,x⟩ 反映了一阶特征的重要程度, 而 ⟨ V i , V j ⟩ x j 1 ⋅ x j 2 \left\langle V_{i}, V_{j}\right\rangle x_{j_{1}} \cdot x_{j_{2}} ⟨Vi,Vj⟩xj1⋅xj2 反映的是二阶交叉特征的影响. (在这里没有详细描述这些符号的含义, 是因为在之前的博客中都有介绍~~~, 不想一直重复相同的内容 🤣 🤣 🤣)
而 Deep 部分比较简单, 将各个特征的 embedding 进行 concatenation, 再输入到深度网络中即可.
核心代码如下:
来自: https://github.com/ChenglongChen/tensorflow-DeepFM/blob/master/DeepFM.py
# model
self.embeddings = tf.nn.embedding_lookup(self.weights["feature_embeddings"],
self.feat_index) # None * F * K
feat_value = tf.reshape(self.feat_value, shape=[-1, self.field_size, 1])
self.embeddings = tf.multiply(self.embeddings, feat_value)
# ---------- first order term ----------
self.y_first_order = tf.nn.embedding_lookup(self.weights["feature_bias"], self.feat_index) # None * F * 1
self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2) # None * F
self.y_first_order = tf.nn.dropout(self.y_first_order, self.dropout_keep_fm[0]) # None * F
# ---------- second order term ---------------
# sum_square part
self.summed_features_emb = tf.reduce_sum(self.embeddings, 1) # None * K
self.summed_features_emb_square = tf.square(self.summed_features_emb) # None * K
# square_sum part
self.squared_features_emb = tf.square(self.embeddings)
self.squared_sum_features_emb = tf.reduce_sum(self.squared_features_emb, 1) # None * K
# second order
self.y_second_order = 0.5 * tf.subtract(self.summed_features_emb_square, self.squared_sum_features_emb) # None * K
self.y_second_order = tf.nn.dropout(self.y_second_order, self.dropout_keep_fm[1]) # None * K
# ---------- Deep component ----------
self.y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.embedding_size]) # None * (F*K)
self.y_deep = tf.nn.dropout(self.y_deep, self.dropout_keep_deep[0])
for i in range(0, len(self.deep_layers)):
self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights["layer_%d" %i]), self.weights["bias_%d"%i]) # None * layer[i] * 1
if self.batch_norm:
self.y_deep = self.batch_norm_layer(self.y_deep, train_phase=self.train_phase, scope_bn="bn_%d" %i) # None * layer[i] * 1
self.y_deep = self.deep_layers_activation(self.y_deep)
self.y_deep = tf.nn.dropout(self.y_deep, self.dropout_keep_deep[1+i]) # dropout at each Deep layer
# ---------- DeepFM ----------
if self.use_fm and self.use_deep:
concat_input = tf.concat([self.y_first_order, self.y_second_order, self.y_deep], axis=1)
elif self.use_fm:
concat_input = tf.concat([self.y_first_order, self.y_second_order], axis=1)
elif self.use_deep:
concat_input = self.y_deep
self.out = tf.add(tf.matmul(concat_input, self.weights["concat_projection"]), self.weights["concat_bias"])
代码中, 最后 FM 部分和 DNN 部分的结果, 并不像论文公式中给出的, 将两者加起来, 而是进行 concatenation.
另外在计算 FM 部分的时候, 用的代码如下:
# ---------- second order term ---------------
# sum_square part
self.summed_features_emb = tf.reduce_sum(self.embeddings, 1) # None * K
self.summed_features_emb_square = tf.square(self.summed_features_emb) # None * K
# square_sum part
self.squared_features_emb = tf.square(self.embeddings)
self.squared_sum_features_emb = tf.reduce_sum(self.squared_features_emb, 1) # None * K
# second order
self.y_second_order = 0.5 * tf.subtract(self.summed_features_emb_square, self.squared_sum_features_emb) # None * K
self.y_second_order = tf.nn.dropout(self.y_second_order, self.dropout_keep_fm[1]) # None * K
用到的公式如下: (详细推导请查阅 FM 算法介绍以及 libFM 源码简析)
∑
i
=
1
n
∑
j
=
i
+
1
n
⟨
v
i
,
v
j
⟩
x
i
x
j
=
1
2
∑
f
=
1
k
(
(
∑
i
=
1
n
v
i
,
f
x
i
)
2
−
∑
j
=
1
n
v
j
,
f
2
x
j
2
)
\sum_{i=1}^{n} \sum_{j=i+1}^{n}\left\langle\mathbf{v}_{i}, \mathbf{v}_{j}\right\rangle x_{i} x_{j} =\frac{1}{2} \sum_{f=1}^{k}\left(\left(\sum_{i=1}^{n} v_{i, f} x_{i}\right)^{2}-\sum_{j=1}^{n} v_{j, f}^{2} x_{j}^{2}\right)
i=1∑nj=i+1∑n⟨vi,vj⟩xixj=21f=1∑k⎝⎛(i=1∑nvi,fxi)2−j=1∑nvj,f2xj2⎠⎞
总结
OK, 在夜里 11:51 完成, 做件坏事: 🤣 🤣 🤣 快乐的周末~~