上文介绍了:推荐系统召回模型之MIND用户多兴趣网络 的理论部分,本文探讨一下实践环节。
1. 胶囊网络(Capsule Network) 与 传统网络比较
MIND模型借鉴了Hiton的胶囊网络(Capsule Network),提出了Multi-Interest Extractor Layer来对用户历史行为Embedding进行软聚类,在介绍它之前我们先用一张图来对比一下Capsule Network与传统神经网络的区别。
上图中右边是传统神经元的模型,它接受来自上一层多个神经元的输出 (标量值) 作为输入,对其进行加权、求和,然后进行 Sigmoid、ReLU 等非线性操作,最后输出一个标量值。
上图中左边就是Capsule论文中提到的几个公式,做一个对比可以发现,capsule的输入是一个向量 ,输出 也是一个向量,中间的公式就是从 到 的计算过程,这个过程可以一一地跟传统的神经元计算过程对应起来。(1) 到 (Eq.2)是一个仿射变换,这是传统神经元所没有的操作;(2)然后 到 是在 的加权下然后对 维度进行求和的操作,这个可以看作向量版的加权求和过程;(3)再接着就是 到 , 这个 squashing函数是非线性的,而且输出的 保持了 的维度,所以可以看作是向量版激活函数。
2. Capsule Layer 设计
论文中的描述:先给每个用户动态的计算出用户的k条兴趣(k_user),然后Capsule Layer层 直接输出用户的k条兴趣(k_user);
在实践中:先给每个用户固定输出k_max条兴趣,然后在Label-aware Attention层再做自适应选取k_user条兴趣向量。【PS:在实际serving一般都是固定的k_max条兴趣】
3. Label-aware Attention 设计
在训练阶段,要进行预测的 Label 只有一个Embedding,而用户有k个Embedding,没法直接求内积计算匹配度。这里MIND提出了Label-aware Attention,思路跟DIN是一致的,就是根据 Label 的 Embedding 对用户的k个Embedding分别求出权重(所谓label-aware),然后对用户的k个Embedding求加权和,得到最终的一个用户 Embedding。
4. 实践出真知
4.1 首先瞅一瞅 Capsule Layer 代码
class CapsuleLayer(Layer):
def __init__(self, input_units, out_units, max_len, k_max, iteration_times=3,
init_std=1.0, **kwargs):
self.input_units = input_units
self.out_units = out_units
self.max_len = max_len
self.k_max = k_max
self.iteration_times = iteration_times
self.init_std = init_std
super(CapsuleLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.routing_logits = self.add_weight(shape=[1, self.k_max, self.max_len],
initializer=RandomNormal(stddev=self.init_std),
trainable=False, name="B", dtype=tf.float32)
self.bilinear_mapping_matrix = self.add_weight(shape=[self.input_units, self.out_units],
initializer=RandomNormal(stddev=self.init_std),
name="S", dtype=tf.float32)
super(CapsuleLayer, self).build(input_shape)
def call(self, inputs, **kwargs):
behavior_embddings, seq_len = inputs
batch_size = tf.shape(behavior_embddings)[0]
seq_len_tile = tf.tile(seq_len, [1, self.k_max])
for i in range(self.iteration_times):
mask = tf.sequence_mask(seq_len_tile, self.max_len)
pad = tf.ones_like(mask, dtype=tf.float32) * (-2 ** 32 + 1)
routing_logits_with_padding = tf.where(mask, tf.tile(self.routing_logits, [batch_size, 1, 1]), pad)
weight = tf.nn.softmax(routing_logits_with_padding)
behavior_embdding_mapping = tf.tensordot(behavior_embddings, self.bilinear_mapping_matrix, axes=1)
Z = tf.matmul(weight, behavior_embdding_mapping)
interest_capsules = squash(Z)
delta_routing_logits = tf.reduce_sum(
tf.matmul(interest_capsules, tf.transpose(behavior_embdding_mapping, perm=[0, 2, 1])),
axis=0, keepdims=True
)
self.routing_logits.assign_add(delta_routing_logits)
interest_capsules = tf.reshape(interest_capsules, [-1, self.k_max, self.out_units])
return interest_capsules
def compute_output_shape(self, input_shape):
return (None, self.k_max, self.out_units)
def get_config(self, ):
config = {'input_units': self.input_units, 'out_units': self.out_units, 'max_len': self.max_len,
'k_max': self.k_max, 'iteration_times': self.iteration_times, "init_std": self.init_std}
base_config = super(CapsuleLayer, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
4.2 然后瞅一瞅 Label-aware Attention 代码
class LabelAwareAttention(Layer):
def __init__(self, k_max, pow_p=1, **kwargs):
self.k_max = k_max
self.pow_p = pow_p
super(LabelAwareAttention, self).__init__(**kwargs)
def build(self, input_shape):
self.embedding_size = input_shape[0][-1]
super(LabelAwareAttention, self).build(input_shape)
def call(self, inputs, training=None, **kwargs):
keys = inputs[0]
query = inputs[1]
weight = tf.reduce_sum(keys * query, axis=-1, keepdims=True)
weight = tf.pow(weight, self.pow_p) # [x,k_max,1]
if len(inputs) == 3:
k_user = tf.cast(tf.maximum(
1.,
tf.minimum(
tf.cast(self.k_max, dtype="float32"), # k_max
tf.math.log1p(tf.cast(inputs[2], dtype="float32")) / tf.math.log(2.) # hist_len
)
), dtype="int64")
seq_mask = tf.transpose(tf.sequence_mask(k_user, self.k_max), [0, 2, 1])
padding = tf.ones_like(seq_mask, dtype=tf.float32) * (-2 ** 32 + 1) # [x,k_max,1]
weight = tf.where(seq_mask, weight, padding)
weight = tf.nn.softmax(weight, name="weight")
output = tf.reduce_sum(keys * weight, axis=1)
return output
def compute_output_shape(self, input_shape):
return (None, self.embedding_size)
def get_config(self, ):
config = {'k_max': self.k_max, 'pow_p': self.pow_p}
base_config = super(LabelAwareAttention, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
4.3 MIND模型设计
(1)数据情况
本文使用的数据集是《movielens-1M数据》,数据下载地址:http://files.grouplens.org/datasets/movielens/ml-1m.zip
将数据集加工成如下格式:
5030 2 2 2 2247 3412,2899,2776,2203,2309,2385,743,2958,2512,2485,2404,2675,2568,2555,217,2491,2566,2481,2503,2786,1051,2502,803,3030,1789,2,424,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 27 2558 1112,1284,1174,3100,1049,2137,2273,2651,340,2163
279 2 3 15 2831 2210,1456,453,1293,3210,2235,2284,1095,1487,3511,738,886,1926,3501,1023,150,1198,3413,156,909,1019,2848,260,2737,1096,2684,1887,107,1143,347,1107,1111,1151,1133,3113,3592,1119,3287,1203,1181,1121,852,1915,1247,3038,240,0,0,0,0 46 2212 820,1009,2076,529,3032,2503,2742,2345,965,366
1300 2 1 11 3282 692,3041,1234,519,1554,1258,3452,1509,1170,1252,2804,754,2866,1987,2416,596,1250,1824,1225,2323,2542,2647,2355,2267,1248,2543,1818,2512,1815,1167,1289,1241,1803,2974,3252,3127,3320,3061,3278,3075,3249,3322,2945,3179,65,1109,3091,1245,2311,3357 165 1880 1545,332,2754,2254,267,1532,1062,1450,1440,2467
323 2 5 13 1799 580,864,1060,2098,2824,1203,1213,1088,1185,2,1925,309,2427,1994,1176,1486,853,1161,29,254,1259,528,1179,1107,1567,4,427,3567,3130,1174,2129,575,347,1415,2786,2204,2487,21,1223,3032,2652,67,2198,1737,45,51,218,2400,1225,467 117 1295 1114,2758,435,318,2251,2111,3650,2510,3705,1111
695 1 2 2 233 2161,2235,700,2962,444,2489,2375,1849,3662,3582,3650,3225,3128,3060,3127,3581,3252,3510,3556,3076,3281,3302,3050,3384,3702,2969,3303,3551,3543,3178,3249,3670,3342,3652,3665,3378,3322,3073,3376,3075,3584,3179,3504,3511,3278,1289,2,467,107,994 190 2945 2456,2716,2635,990,3657,3403,2210,1602,3251,143
数据说明:
第 1 列 | user_id | 用户id |
第 2 列 | gender | 用户性别 |
第 3 列 | age | 用户年龄 |
第 4 列 | occupation | 用户工作 |
第 5 列 | zip | 用户邮编 |
第 6 列 | hist_movie_id | 用户历史观看电影序列 |
第 7 列 | hist_len | 用户历史观看电影长度 |
第 8 列 | pos_movie_id | 用户下一步观看的电影(正样本) |
第 9 列 | neg_movie_id | 用户下一步未观看的电影(抽样作为负样本) |
数据加工逻辑见:https://github.com/wziji/deep_ctr/blob/master/mind/data.py
(2)MIND模型代码:
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, concatenate, Flatten, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from CapsuleLayer import SequencePoolingLayer, LabelAwareAttention, CapsuleLayer
def tile_user_otherfeat(user_other_feature, k_max):
return tf.tile(tf.expand_dims(user_other_feature, -2), [1, k_max, 1])
def mind(
sparse_input_length=1,
dense_input_length=1,
sparse_seq_input_length=50,
embedding_dim = 64,
neg_sample_num = 10,
user_hidden_unit_list = [128, 64],
k_max = 5,
p = 1,
dynamic_k = True
):
# 1. Input layer
user_id_input_layer = Input(shape=(sparse_input_length, ), name="user_id_input_layer")
gender_input_layer = Input(shape=(sparse_input_length, ), name="gender_input_layer")
age_input_layer = Input(shape=(sparse_input_length, ), name="age_input_layer")
occupation_input_layer = Input(shape=(sparse_input_length, ), name="occupation_input_layer")
zip_input_layer = Input(shape=(sparse_input_length, ), name="zip_input_layer")
user_click_item_seq_input_layer = Input(shape=(sparse_seq_input_length, ), name="user_click_item_seq_input_layer")
user_click_item_seq_length_input_layer = Input(shape=(sparse_input_length, ), name="user_click_item_seq_length_input_layer")
pos_item_sample_input_layer = Input(shape=(sparse_input_length, ), name="pos_item_sample_input_layer")
neg_item_sample_input_layer = Input(shape=(neg_sample_num, ), name="neg_item_sample_input_layer")
# 2. Embedding layer
user_id_embedding_layer = Embedding(6040+1, embedding_dim, mask_zero=True, name='user_id_embedding_layer')(user_id_input_layer)
gender_embedding_layer = Embedding(2+1, embedding_dim, mask_zero=True, name='gender_embedding_layer')(gender_input_layer)
age_embedding_layer = Embedding(7+1, embedding_dim, mask_zero=True, name='age_embedding_layer')(age_input_layer)
occupation_embedding_layer = Embedding(21+1, embedding_dim, mask_zero=True, name='occupation_embedding_layer')(occupation_input_layer)
zip_embedding_layer = Embedding(3439+1, embedding_dim, mask_zero=True, name='zip_embedding_layer')(zip_input_layer)
item_id_embedding_layer = Embedding(3706+1, embedding_dim, mask_zero=True, name='item_id_embedding_layer')
pos_item_sample_embedding_layer = item_id_embedding_layer(pos_item_sample_input_layer)
neg_item_sample_embedding_layer = item_id_embedding_layer(neg_item_sample_input_layer)
user_click_item_seq_embedding_layer = item_id_embedding_layer(user_click_item_seq_input_layer)
### ********** ###
# 3. user part
### ********** ###
# 3.1 pooling layer
user_click_item_seq_embedding_layer_pooling = SequencePoolingLayer()\
([user_click_item_seq_embedding_layer, user_click_item_seq_length_input_layer])
print("user_click_item_seq_embedding_layer_pooling", user_click_item_seq_embedding_layer_pooling)
# 3.2 capsule layer
high_capsule = CapsuleLayer(input_units=embedding_dim,
out_units=embedding_dim, max_len=sparse_seq_input_length,
k_max=k_max)\
([user_click_item_seq_embedding_layer, user_click_item_seq_length_input_layer])
print("high_capsule: ", high_capsule)
# 3.3 Concat "sparse" embedding & "sparse_seq" embedding, and tile embedding
other_user_embedding_layer = concatenate([user_id_embedding_layer, gender_embedding_layer, \
age_embedding_layer, occupation_embedding_layer, \
zip_embedding_layer, user_click_item_seq_embedding_layer_pooling],
axis=-1)
other_user_embedding_layer = tf.tile(other_user_embedding_layer, [1, k_max, 1])
print("other_user_embedding_layer: ", other_user_embedding_layer)
# 3.4 user dnn part
user_deep_input = concatenate([other_user_embedding_layer, high_capsule], axis=-1)
print("user_deep_input: ", user_deep_input)
for i, u in enumerate(user_hidden_unit_list):
user_deep_input = Dense(u, activation="relu", name="FC_{0}".format(i+1))(user_deep_input)
#user_deep_input = Dropout(0.3)(user_deep_input)
print("user_deep_input: ", user_deep_input)
if dynamic_k:
user_embedding_final = LabelAwareAttention(k_max=k_max, pow_p=p, )(\
[user_deep_input, pos_item_sample_embedding_layer, \
user_click_item_seq_length_input_layer])
else:
user_embedding_final = LabelAwareAttention(k_max=k_max, pow_p=p, )(\
[user_deep_input, pos_item_sample_embedding_layer])
user_embedding_final = tf.expand_dims(user_embedding_final, 1)
print("user_embedding_final: ", user_embedding_final)
### ********** ###
# 4. item part
### ********** ###
item_embedding_layer = concatenate([pos_item_sample_embedding_layer, \
neg_item_sample_embedding_layer], \
axis=1)
item_embedding_layer = tf.transpose(item_embedding_layer, [0,2,1])
print("item_embedding_layer: ", item_embedding_layer)
### ********** ###
# 5. Output
### ********** ###
dot_output = tf.matmul(user_embedding_final, item_embedding_layer)
dot_output = tf.nn.softmax(dot_output) # 输出11个值,index为0的值是正样本,负样本的索引位置为[1-10]
print(dot_output)
user_inputs_list = [user_id_input_layer, gender_input_layer, age_input_layer, \
occupation_input_layer, zip_input_layer, \
user_click_item_seq_input_layer, user_click_item_seq_length_input_layer]
item_inputs_list = [pos_item_sample_input_layer, neg_item_sample_input_layer]
model = Model(inputs = user_inputs_list + item_inputs_list,
outputs = dot_output)
#print(model.summary())
#tf.keras.utils.plot_model(model, to_file='MIND_model.png', show_shapes=True)
model.__setattr__("user_input", user_inputs_list)
model.__setattr__("user_embedding", user_deep_input)
model.__setattr__("item_input", pos_item_sample_input_layer)
model.__setattr__("item_embedding", pos_item_sample_embedding_layer)
return model
(3)MIND模型结构图
输入:7个特征数据,和2组label数据(包含1个正样本数据,抽样的10个负样本数据);
输出:11个样本的 Softmax 概率分布;
(4)训练MIND模型
import tensorflow as tf
from mind import mind
from tensorflow.keras.optimizers import Adam
early_stopping_cb = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
callbacks = [early_stopping_cb]
model = mind()
model.compile(loss='sparse_categorical_crossentropy', \
optimizer=Adam(lr=1e-3), \
metrics=['sparse_categorical_accuracy'])
# loss="sparse_categorical_accuracy"的应用方式参见:https://mp.weixin.qq.com/s/H4ET0bO_xPm8TNqltMt3Fg
history = model.fit(train_generator, \
epochs=2, \
steps_per_epoch = steps_per_epoch, \
callbacks = callbacks,
validation_data = val_generator, \
validation_steps = validation_steps, \
shuffle=True
)
model.save_weights('mind_model.h5')
训练结果如下所示:
Train for 989 steps, validate for 7 steps
Epoch 1/2
989/989 [==============================] - 137s 139ms/step - loss: 1.6125 - sparse_categorical_accuracy: 0.4041 - val_loss: 1.5422 - val_sparse_categorical_accuracy: 0.4224
Epoch 2/2
989/989 [==============================] - 131s 133ms/step - loss: 1.3553 - sparse_categorical_accuracy: 0.4910 - val_loss: 1.4716 - val_sparse_categorical_accuracy: 0.4604
本文的代码请见,欢迎交流:https://github.com/wziji/deep_ctr/tree/master/mind
参考:
(1)https://github.com/shenweichen/DeepMatch/blob/master/deepmatch/models/mind.py
(2)https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py
欢迎关注 “python科技园” 及 添加小编 进群交流。