欢迎关注微信公众号:python科技园
1. Activation Unit 图示
2. Activation Unit结构
(1)输入层:
-
user 产生行为(例如点击、或购买)的 item embedding,作为key;
-
候选 item embedding,作为query;
(2)Out Product层:
- 计算矩阵之间的 element-wise 乘法;
(3)Concat层:
- 将 query, key, query-key, query*key(Out Product层,element-wise 乘法) 的结果进行拼接;
(4)Dense层:
- 全连接层,并以PRelu或Dice作为激活函数;
(5)Linear层(输出层):
- 全连接层,输出单元为1,即得到(query, key)相应的权重值;
3. Activation Unit 公式
v U ( A ) = f ( v A , e 1 , e 2 , . . . , e H ) = ∑ j = 1 H a ( e j , v A ) e j = ∑ j = 1 H w j e j v_U(A) = f(v_A, e_1, e_2, ..., e_H) = \sum_{j=1}^{H}a(e_j, v_A)e_j = \sum_{j=1}^{H}w_je_j vU(A)=f(vA,e1,e2,...,eH)=j=1∑Ha(ej,vA)ej=j=1∑Hwjej
其中:
- { e 1 , e 2 , . . . , e H } \{e_1, e_2, ..., e_H\} {e1,e2,...,eH} 是用户 U U U历史行为Item的embedding,长度为H;
- v A v_A vA 是候选Item的embedding;
- a ( ) a() a() 是前向网络,矩阵以 o u t p r o d u c t out\ product out product进行计算,输出结果为权重值;
- ∑ j = 1 H w j = 1 \sum_{j=1}^{H}w_j = 1 ∑j=1Hwj=1;
4. Activation Unit 代码讲解
(1)定义query, keys
query = tf.convert_to_tensor(np.asarray([[[1., 1., 1., 3.]]]), dtype=tf.double)
key_list = tf.convert_to_tensor(np.asarray([[[1., 1., 2., 4.],
[4., 1., 1., 3.],
[1., 1., 2., 1.]]]), dtype=tf.double)
queries = K.repeat_elements(query, key_list.get_shape()[1], 1)
print("queries: \n", queries)
print()
print("key_list: \n", key_list)
结果
- query为1组向量,模拟1个候选Item的Embedding;
- keys为3组向量,模拟某用户历史购买的3个Item的Embedding;
(2)拼接 query, key, query-key, query*key
att_input = tf.concat(
[queries, key_list, queries-key_list, queries*key_list], axis=-1)
print("att_input: \n", att_input)
结果
(3)定义第一层 Dense 的weights, bias,及运算
dense_16to4_weights = tf.convert_to_tensor([[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 2., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 2., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 1., 1],
[1., 1., 2., 1]], dtype=tf.double)
dense_4_bias = tf.convert_to_tensor([1., 2., 3., 10.], dtype=tf.double)
dense_16to4 = tf.nn.bias_add(
tf.tensordot(att_input, dense_16to4_weights, axes=(-1, 0)),
dense_4_bias)
print("dense_16to4: \n\n", dense_16to4)
结果
- 将原16维的向量压缩到4维;
(4)定义最后一层 Dense 的weights, bias,及运算
dense_4to1_weights = tf.convert_to_tensor([1., 2., 3., 10.], dtype=tf.double)
dense_1_bias = tf.constant([0.2], dtype=tf.double)
dense_4to1 = tf.tensordot(dense_16to4, dense_4to1_weights, axes=(-1, 0)) + dense_1_bias
print("dense_4to1: \n\n", dense_4to1)
结果
- [601.2 576.2 439.2] 分别为 “某用户历史购买的3个Item的Embedding” 的权重值;
还需要对最终的结果做 S o f t m a x Softmax Softmax 归一化处理,原文中:That is, normalization with softmax on the output of a(·) is abandoned.
dense_4to1_softmax = tf.nn.softmax(dense_4to1)
attention_result = tf.matmul(dense_4to1_softmax, key_list)
print("dense_4to1_softmax: \n\n", dense_4to1_softmax)
print()
print("attention_result: \n\n", attention_result)
也就是说,某用户最后的兴趣Embeddign分布如
a
t
t
e
n
t
i
o
n
_
r
e
s
u
l
t
attention\_result
attention_result 所示。