分析DIN模型中的 Activation Unit

最新推荐文章于 2023-05-11 08:00:00 发布

我是京城小白

最新推荐文章于 2023-05-11 08:00:00 发布

阅读量1.5k

点赞数 6

分类专栏：深度学习文章标签： tensorflow 深度学习

本文链接：https://blog.csdn.net/wdh315172/article/details/105978247

版权

深度学习专栏收录该内容

26 篇文章 1 订阅

订阅专栏

欢迎关注微信公众号：python科技园

1. Activation Unit 图示

在这里插入图片描述

2. Activation Unit结构

（1）输入层：

user 产生行为（例如点击、或购买）的 item embedding，作为key；
候选 item embedding，作为query；

（2）Out Product层：

计算矩阵之间的 element-wise 乘法；

（3）Concat层：

将 query, key, query-key, query*key(Out Product层，element-wise 乘法) 的结果进行拼接；

（4）Dense层：

全连接层，并以PRelu或Dice作为激活函数；

（5）Linear层（输出层）：

全连接层，输出单元为1，即得到（query, key）相应的权重值；

3. Activation Unit 公式

$v_U(A) = f(v_A, e_1, e_2, ..., e_H) = \sum_{j=1}^{H}a(e_j, v_A)e_j = \sum_{j=1}^{H}w_je_j$

其中：

${e_1, e_2, ..., e_H\}$ 是用户 $U$ 历史行为Item的embedding，长度为H；
$v_A$ 是候选Item的embedding；
$a ()$ 是前向网络，矩阵以 $out\ product$ 进行计算，输出结果为权重值；
$\sum_{j=1}^{H}w_j = 1$ ；

4. Activation Unit 代码讲解

（1）定义query, keys

query = tf.convert_to_tensor(np.asarray([[[1., 1., 1., 3.]]]), dtype=tf.double)
key_list = tf.convert_to_tensor(np.asarray([[[1., 1., 2., 4.], 
                                             [4., 1., 1., 3.], 
                                             [1., 1., 2., 1.]]]), dtype=tf.double)

queries = K.repeat_elements(query, key_list.get_shape()[1], 1)

print("queries: \n", queries)
print()
print("key_list: \n", key_list)

结果

在这里插入图片描述

query为1组向量，模拟1个候选Item的Embedding；
keys为3组向量，模拟某用户历史购买的3个Item的Embedding；

（2）拼接 query, key, query-key, query*key

att_input = tf.concat(
        [queries, key_list, queries-key_list, queries*key_list], axis=-1)

print("att_input: \n", att_input)

结果

在这里插入图片描述

（3）定义第一层 Dense 的weights, bias，及运算

dense_16to4_weights = tf.convert_to_tensor([[1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 2., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 2., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 1., 1], 
                               [1., 1., 2., 1]], dtype=tf.double)

dense_4_bias = tf.convert_to_tensor([1., 2., 3., 10.], dtype=tf.double)

dense_16to4 = tf.nn.bias_add(
                    tf.tensordot(att_input, dense_16to4_weights, axes=(-1, 0)), 
                    dense_4_bias)

print("dense_16to4: \n\n", dense_16to4)

结果

在这里插入图片描述

将原16维的向量压缩到4维；

（4）定义最后一层 Dense 的weights, bias，及运算

dense_4to1_weights = tf.convert_to_tensor([1., 2., 3., 10.], dtype=tf.double)
dense_1_bias = tf.constant([0.2], dtype=tf.double)

dense_4to1 = tf.tensordot(dense_16to4, dense_4to1_weights, axes=(-1, 0)) + dense_1_bias

print("dense_4to1: \n\n", dense_4to1)

结果

在这里插入图片描述

[601.2 576.2 439.2] 分别为 “某用户历史购买的3个Item的Embedding” 的权重值；

还需要对最终的结果做 $S o f t m a x$ 归一化处理，原文中：That is, normalization with softmax on the output of a(·) is abandoned.

dense_4to1_softmax = tf.nn.softmax(dense_4to1)
attention_result = tf.matmul(dense_4to1_softmax, key_list)

print("dense_4to1_softmax: \n\n", dense_4to1_softmax)
print()

print("attention_result: \n\n", attention_result)