AFM算法

最新推荐文章于 2023-05-04 00:42:13 发布

I am stupid

最新推荐文章于 2023-05-04 00:42:13 发布

阅读量1.2k

点赞数

本文链接：https://blog.csdn.net/zhongqiqianga/article/details/103465275

版权

1:研究背景

2:网络结构

3:代码实战

1:研究背景

FM算法本身通过引入二阶feature interactions来提高线性回归模型的泛化表达能力，但它以相同的权重来对所有的特征组合进行建模。事实上很多无用特征的组合会引入噪声从而影响效果。基于这个背景下，论文提出Attentional Factorization Machine（AFM），通过neural attention network来学习每个特征组合的重要性。AFM和NFM一样也是一个串行的FM&DNN结构。在进行预测时，FM会让一个特征固定一个特定的向量，当这个特征与其他特征做交叉时，都是用同样的向量去做计算。这个是很不合理的，因为不同的特征之间的交叉，重要程度是不一样的。如何体现这种重要程度，之前介绍的FFM模型是一个方案。另外，结合了attention机制的AFM模型，也是一种解决方案。本文是解决这个问题的另一种思路，也就是对不同的特征组合赋予不同的权值，而且这个权值是可学习的，体现了模型对不同的特征组合的关注度不同。我的理解是对与最后的分类贡献程度较高的特征组合，会赋予较高的权值。

2:网络结构

公式：

3:代码实战

这里这贴出attention的实现：

            with tf.name_scope('Pair-wise_Interaction_Layer'):
                pair_wise_product_list = []
                for i in range(self.field_size):
                    for j in range(i + 1, self.field_size):
                        pair_wise_product_list.append(
                            tf.multiply(self.embeddings[:, i, :], self.embeddings[:, j, :]))  # [None, embedding_size]
                self.pair_wise_product = tf.stack(pair_wise_product_list)  # [embedding_size*(embedding_size + 1)/2, None, embedding_size]
                self.pair_wise_product = tf.transpose(self.pair_wise_product, perm=[1, 0, 2],name='pair_wise_product') # [None, field_size*(field_size - 1)/2, embedding_size]
                self.pair_wise_product = tf.nn.dropout(self.pair_wise_product, self.dropout_keep_fm[1])

            with tf.name_scope('attention_net'):
                glorot = np.sqrt(2.0 / (self.attention_size + self.embedding_size))
                weights['attention_w'] = tf.Variable(
                    np.random.normal(loc=0, scale=glorot, size=(self.embedding_size, self.attention_size)),
                    dtype=tf.float32, name='attention_w')
                biases['attention_b'] = tf.Variable(
                    np.random.normal(loc=0, scale=glorot, size=(1, self.attention_size)),
                    dtype=tf.float32, name='attention_b')
                weights['attention_h'] = tf.Variable(np.random.normal(loc=0, scale=1, size=(1, self.attention_size)),
                                                     dtype=tf.float32, name='attention_h')
                weights['attention_p'] = tf.Variable(np.random.normal(loc=0, scale=1, size=(self.embedding_size, 1)),
                                                     dtype=tf.float32, name='attention_p')  # 若p全为1，则直接表示FM二阶项
                num_interactions = self.pair_wise_product.shape.as_list()[1]
                # w*x + b
                self.attention_wx_plus_b = tf.add(
                    tf.matmul(tf.reshape(self.pair_wise_product, shape=[-1, self.embedding_size]),
                              weights['attention_w']), biases['attention_b'])
                self.attention_wx_plus_b = tf.reshape(self.attention_wx_plus_b, shape=[-1, num_interactions,
                                                                               self.attention_size])  # [None, field_size*(field_size - 1)/2, attention_size]
                # relu(w*x + b)
                self.attention_relu_wx_plus_b = tf.nn.relu(
                    self.attention_wx_plus_b)  # [None, field_size*(field_size - 1)/2, attention_size]
                # h*relu(w*x + b)
                self.attention_h_mul_relu_wx_plus_b = tf.multiply(self.attention_relu_wx_plus_b, weights[
                    'attention_h'])  # [None, field_size*(field_size - 1)/2, attention_size]
                # exp(h*relu(w*x + b))
                self.attention_exp = tf.exp(tf.reduce_sum(self.attention_h_mul_relu_wx_plus_b, axis=2,
                                                          keep_dims=True))  # [None, field_size*(field_size - 1)/2, 1]
                # sum(exp(h*relu(w*x + b)))
                self.attention_exp_sum = tf.reduce_sum(self.attention_exp, axis=1, keep_dims=True)  # [None, 1, 1]
                # exp(h*relu(w*x + b)) / sum(exp(h*relu(w*x + b)))
                self.attention_out = tf.div(self.attention_exp, self.attention_exp_sum,
                                            name='attention_out')  # [None, field_size*(field_size - 1)/2, 1]
                # attention*Pair-wise
                self.attention_product = tf.multiply(self.attention_out,
                                                     self.pair_wise_product)  # [None, field_size*(field_size - 1)/2, embedding_size]
                self.attention_product = tf.reduce_sum(self.attention_product, axis=1)  # [None, embedding_size]
                # p*attention*Pair-wise
                self.attention_net_out = tf.matmul(self.attention_product, weights['attention_p'])  # [None, 1]

                if self.batch_norm:
                    self.attention_net_out = self.batch_norm_layer(self.attention_net_out, train_phase=self.train_phase,
                                                                   scope_bn='bn1')

            with tf.name_scope('out'):
                self.out = tf.add_n(
                    [self.w0, self.linear_out, self.attention_net_out])  # # yAFM = w0 + wx + attention(x)

I am stupid

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
AFM算法

目录1:研究背景2:网络结构3:代码实战1:研究背景FM算法本身通过引入二阶feature interactions来提高线性回归模型的泛化表达能力，但它以相同的权重来对所有的特征组合进行建模。事实上很多无用特征的组合会引入噪声从而影响效果。基于这个背景下，论文提出Attentional Factorization Machine（AFM），通过neural attentio...
复制链接

扫一扫