NFM 网络介绍与源码浅析

最新推荐文章于 2024-06-21 15:16:20 发布

珍妮的选择

最新推荐文章于 2024-06-21 15:16:20 发布

阅读量796

点赞数 1

分类专栏：机器学习文章标签：深度学习 python tensorflow NFM CTR

本文链接：https://blog.csdn.net/Eric_1993/article/details/108164105

版权

机器学习专栏收录该内容

24 篇文章 36 订阅

订阅专栏

NFM 网络介绍与源码浅析

前言

OK, 周末继续肝!!! 昨晚完成了 FNN 网络介绍与源码浅析

广而告之

可以在微信中搜索 “珍妮的算法之路” 或者 “world4458” 关注我的微信公众号；另外可以看看知乎专栏 PoorMemory-机器学习, 以后文章也会发在知乎专栏中；

NFM (Neural Factorization Machines)

文章信息

论文标题: Neural Factorization Machines for Sparse Predictive Analytics
论文地址: https://arxiv.org/abs/1708.05027
代码地址: https://github.com/hexiangnan/neural_factorization_machine
发表时间: SIGIR 2017
论文作者: Xiangnan He, Tat-Seng Chua
作者单位: National University of Singapore

核心观点

本文主要介绍了 Bi-Interaction Pooling Layer, 用于对二阶交叉特征进行建模. 和 FM 使用内积对二阶交叉特征建模的思路不同的是, Bi-Interaction Pooling Layer 是将两两特征进行 element-wise product, 生成了 $\frac{n\times (n - 1)}{2}$ 个交叉特征, 然后再对这些交叉特征进行累加 (这应该就是名字中 Pooling 的含义), 当然在实际计算中, 是无需将 $\frac{n\times (n - 1)}{2}$ 个交叉特征全部算出来的, 而是类似 FM 的思路, 对计算公式做了调整, 降低了计算复杂度. Bi-Interaction Pooling Layer 的输出结果之后会进一步输入到 MLP 中, 以学习更丰富高阶的特征.

另外, 为了利用到低阶特征, 作者设计的网络中也包含了线性部分 (网络结构图中没有画出来), 类似于 Wide & Deep 的结构.

还有, 我觉得网络的 Deep 部分最后的效果其实就是将 FM 的二阶交叉特征输入到 DNN 网络中~~

核心观点介绍

NFM 网络的 Deep 部分结构图如下所示 (注意这是 Deep 部分, 线性部分没有在图中体现):

其中 $\mathcal{V}_{x} = \left\{x_{1} \mathbf{v}_{1}, \ldots, x_{n} \mathbf{v}_{n}\right\}$ 表示输入特征 $\bm{x}$ 对应的 embedding 向量. 之后特征的 embeddings 输入到 Bi-Interaction Layer 中, 得到的结果如下:

$f_{B I}\left(\mathcal{V}_{x}\right)=\sum_{i=1}^{n} \sum_{j=i+1}^{n} x_{i} \mathbf{v}_{i} \odot x_{j} \mathbf{v}_{j}$

其中 $\odot$ 表示 element-wise product; 注意到 Bi-Interaction Layer 效果和 Pooling operation 是一样的, 将多个交叉特征累加, 转换为一个向量. 上式可以用线性时间计算出来, 只需要将公式改为:

$f_{B I}\left(\mathcal{V}_{x}\right)=\frac{1}{2}\left[\left(\sum_{i=1}^{n} x_{i} \mathbf{v}_{i}\right)^{2}-\sum_{i=1}^{n}\left(x_{i} \mathbf{v}_{i}\right)^{2}\right]$

其中 $\mathbf{v}^2$ 表示 $\mathbf{v}\odot\mathbf{v}$ . 是不是嗅到了熟悉的 FM 的味道 🤣 🤣 🤣

之后, Bi-Interaction Layer 的输出结果进一步输入到 MLP 中, 得到向量 $\bm{z}_L$ , 为了在输出层得到预测 score, 还需要使用权重 $\bm{h}^T$ 将 $\bm{z}_L$ 转化为数值: $\bm{h}^T\cdot\bm{z}_L$ .

另外, NFM 还用线性层对低阶特征进行了处理. 因此, NFM 的完整结构可以公式表示为:

$\begin{aligned} \hat{y}_{N F M}(\mathbf{x}) &=w_{0}+\sum_{i=1}^{n} w_{i} x_{i} \\ &+\mathbf{h}^{T} \sigma_{L}\left(\mathbf{W}_{L}\left(\ldots \sigma_{1}\left(\mathbf{W}_{1} f_{B I}\left(\mathcal{V}_{x}\right)+\mathbf{b}_{1}\right) \ldots\right)+\mathbf{b}_{L}\right) \end{aligned}$

整个模型的参数为: $\Theta=\left\{w_{0},\left\{w_{i}, \mathbf{v}_{i}\right\}, \mathbf{h},\left\{\mathbf{W}_{l}, \mathbf{b}_{l}\right\}\right\}$ , 和 FM 相比, NFM 多出的参数主要是 $\left\{\mathbf{W}_{l}, \mathbf{b}_{l}\right\}$ , 即 MLP 的参数, 主要用来对特征间更高阶的交叉进行学习.

下面看看代码:

来自 https://github.com/hexiangnan/neural_factorization_machine/blob/master/NeuralFM.py

其中 NFM 的核心实现如下:

# Model.
# _________ sum_square part _____________
# get the summed up embeddings of features.
nonzero_embeddings = tf.nn.embedding_lookup(self.weights['feature_embeddings'], self.train_features)
self.summed_features_emb = tf.reduce_sum(nonzero_embeddings, 1) # None * K
# get the element-multiplication
self.summed_features_emb_square = tf.square(self.summed_features_emb)  # None * K

# _________ square_sum part _____________
self.squared_features_emb = tf.square(nonzero_embeddings)
self.squared_sum_features_emb = tf.reduce_sum(self.squared_features_emb, 1)  # None * K

# ________ FM __________
self.FM = 0.5 * tf.sub(self.summed_features_emb_square, self.squared_sum_features_emb)  # None * K
if self.batch_norm:
    self.FM = self.batch_norm_layer(self.FM, train_phase=self.train_phase, scope_bn='bn_fm')
self.FM = tf.nn.dropout(self.FM, self.dropout_keep[-1]) # dropout at the bilinear interactin layer

# ________ Deep Layers __________
for i in range(0, len(self.layers)):
    self.FM = tf.add(tf.matmul(self.FM, self.weights['layer_%d' %i]), self.weights['bias_%d'%i]) # None * layer[i] * 1
    if self.batch_norm:
        self.FM = self.batch_norm_layer(self.FM, train_phase=self.train_phase, scope_bn='bn_%d' %i) # None * layer[i] * 1
    self.FM = self.activation_function(self.FM)
    self.FM = tf.nn.dropout(self.FM, self.dropout_keep[i]) # dropout at each Deep layer
self.FM = tf.matmul(self.FM, self.weights['prediction'])     # None * 1

# _________out _________
Bilinear = tf.reduce_sum(self.FM, 1, keep_dims=True)  # None * 1
self.Feature_bias = tf.reduce_sum(tf.nn.embedding_lookup(self.weights['feature_bias'], self.train_features) , 1)  # None * 1
Bias = self.weights['bias'] * tf.ones_like(self.train_labels)  # None * 1
self.out = tf.add_n([Bilinear, self.Feature_bias, Bias])  # None * 1

代码并不复杂, 其中:

nonzero_embeddings = tf.nn.embedding_lookup(self.weights['feature_embeddings'], self.train_features)
self.summed_features_emb = tf.reduce_sum(nonzero_embeddings, 1) # None * K
# get the element-multiplication
self.summed_features_emb_square = tf.square(self.summed_features_emb)  # None * K

# _________ square_sum part _____________
self.squared_features_emb = tf.square(nonzero_embeddings)
self.squared_sum_features_emb = tf.reduce_sum(self.squared_features_emb, 1)  # None * K

# ________ FM __________
self.FM = 0.5 * tf.sub(self.summed_features_emb_square, self.squared_sum_features_emb)  # None * K

和 FM 就是一模一样的~, 只不过 FM 最后还要对 embedding 这个维度 (就是注释中 K 这个维度) 进行累加.

总结

Good, 和 DeepFM 有区别也有联系, 后面再安排上 DeepFM, 现在去锻炼 🤣 🤣 🤣

珍妮的选择

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
NFM 网络介绍与源码浅析

NFM 网络介绍与源码浅析前言OK, 周末继续肝!!! 昨晚完成了 FNN 网络介绍与源码浅析NFM (Neural Factorization Machines)文章信息论文标题: Neural Factorization Machines for Sparse Predictive Analytics论文地址: https://arxiv.org/abs/1708.05027代码地址: https://github.com/hexiangnan/neural_factorization_
复制链接

扫一扫

专栏目录