推荐系统召回模型之MIND用户多兴趣网络实践

上文介绍了:推荐系统召回模型之MIND用户多兴趣网络 的理论部分,本文探讨一下实践环节。

1. 胶囊网络(Capsule Network) 与 传统网络比较

MIND模型借鉴了Hiton的胶囊网络(Capsule Network),提出了Multi-Interest Extractor Layer来对用户历史行为Embedding进行软聚类,在介绍它之前我们先用一张图来对比一下Capsule Network与传统神经网络的区别。

上图中右边是传统神经元的模型,它接受来自上一层多个神经元的输出   (标量值) 作为输入,对其进行加权、求和,然后进行 Sigmoid、ReLU 等非线性操作,最后输出一个标量值。

上图中左边就是Capsule论文中提到的几个公式,做一个对比可以发现,capsule的输入是一个向量   ,输出   也是一个向量,中间的公式就是从   到   的计算过程,这个过程可以一一地跟传统的神经元计算过程对应起来。(1)   到   (Eq.2)是一个仿射变换,这是传统神经元所没有的操作;(2)然后   到   是在   的加权下然后对   维度进行求和的操作,这个可以看作向量版的加权求和过程;(3)再接着就是   到   ,   这个 squashing函数是非线性的,而且输出的   保持了   的维度,所以可以看作是向量版激活函数

2. Capsule Layer 设计

论文中的描述:先给每个用户动态的计算出用户的k条兴趣(k_user),然后Capsule Layer层 直接输出用户的k条兴趣(k_user);

在实践中:先给每个用户固定输出k_max条兴趣,然后在Label-aware Attention层再做自适应选取k_user条兴趣向量。【PS:在实际serving一般都是固定的k_max条兴趣】

3. Label-aware Attention 设计

在训练阶段,要进行预测的 Label 只有一个Embedding,而用户有k个Embedding,没法直接求内积计算匹配度。这里MIND提出了Label-aware Attention,思路跟DIN是一致的,就是根据 Label 的 Embedding 对用户的k个Embedding分别求出权重(所谓label-aware),然后对用户的k个Embedding求加权和,得到最终的一个用户 Embedding。

4. 实践出真知

4.1 首先瞅一瞅 Capsule Layer 代码

class CapsuleLayer(Layer):
    def __init__(self, input_units, out_units, max_len, k_max, iteration_times=3,
           init_std=1.0, **kwargs):
        self.input_units = input_units
        self.out_units = out_units
        self.max_len = max_len
        self.k_max = k_max
        self.iteration_times = iteration_times
        self.init_std = init_std
        super(CapsuleLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.routing_logits = self.add_weight(shape=[1, self.k_max, self.max_len],
              initializer=RandomNormal(stddev=self.init_std),
              trainable=False, name="B", dtype=tf.float32)

        self.bilinear_mapping_matrix = self.add_weight(shape=[self.input_units, self.out_units],
              initializer=RandomNormal(stddev=self.init_std),
              name="S", dtype=tf.float32)

        super(CapsuleLayer, self).build(input_shape)

    def call(self, inputs, **kwargs):
        behavior_embddings, seq_len = inputs
        batch_size = tf.shape(behavior_embddings)[0]
        seq_len_tile = tf.tile(seq_len, [1, self.k_max])

        for i in range(self.iteration_times):
            mask = tf.sequence_mask(seq_len_tile, self.max_len)
            pad = tf.ones_like(mask, dtype=tf.float32) * (-2 ** 32 + 1)
            routing_logits_with_padding = tf.where(mask, tf.tile(self.routing_logits, [batch_size, 1, 1]), pad)
            weight = tf.nn.softmax(routing_logits_with_padding)
            behavior_embdding_mapping = tf.tensordot(behavior_embddings, self.bilinear_mapping_matrix, axes=1)
            Z = tf.matmul(weight, behavior_embdding_mapping)
            interest_capsules = squash(Z)
            
            delta_routing_logits = tf.reduce_sum(
                tf.matmul(interest_capsules, tf.transpose(behavior_embdding_mapping, perm=[0, 2, 1])),
                axis=0, keepdims=True
            )
            self.routing_logits.assign_add(delta_routing_logits)

        interest_capsules = tf.reshape(interest_capsules, [-1, self.k_max, self.out_units])
        return interest_capsules

    def compute_output_shape(self, input_shape):
        return (None, self.k_max, self.out_units)

    def get_config(self, ):
        config = {'input_units': self.input_units, 'out_units': self.out_units, 'max_len': self.max_len,
                  'k_max': self.k_max, 'iteration_times': self.iteration_times, "init_std": self.init_std}
        base_config = super(CapsuleLayer, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

4.2 然后瞅一瞅 Label-aware Attention 代码

class LabelAwareAttention(Layer):
    def __init__(self, k_max, pow_p=1, **kwargs):
        self.k_max = k_max
        self.pow_p = pow_p
        super(LabelAwareAttention, self).__init__(**kwargs)

    def build(self, input_shape):
        self.embedding_size = input_shape[0][-1]
        super(LabelAwareAttention, self).build(input_shape)

    def call(self, inputs, training=None, **kwargs):
        keys = inputs[0]
        query = inputs[1]
        weight = tf.reduce_sum(keys * query, axis=-1, keepdims=True)
        weight = tf.pow(weight, self.pow_p) # [x,k_max,1]

        if len(inputs) == 3:
            k_user = tf.cast(tf.maximum(
                1.,
                tf.minimum(
                    tf.cast(self.k_max, dtype="float32"), # k_max
                    tf.math.log1p(tf.cast(inputs[2], dtype="float32")) / tf.math.log(2.) # hist_len
                )
            ), dtype="int64")
            
            seq_mask = tf.transpose(tf.sequence_mask(k_user, self.k_max), [0, 2, 1])
            padding = tf.ones_like(seq_mask, dtype=tf.float32) * (-2 ** 32 + 1) # [x,k_max,1]
            weight = tf.where(seq_mask, weight, padding)

        weight = tf.nn.softmax(weight, name="weight")
        output = tf.reduce_sum(keys * weight, axis=1)

        return output

    def compute_output_shape(self, input_shape):
        return (None, self.embedding_size)

    def get_config(self, ):
        config = {'k_max': self.k_max, 'pow_p': self.pow_p}
        base_config = super(LabelAwareAttention, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))


4.3 MIND模型设计

(1)数据情况

本文使用的数据集是《movielens-1M数据》,数据下载地址:http://files.grouplens.org/datasets/movielens/ml-1m.zip

将数据集加工成如下格式:

5030  2  2  2  2247  3412,2899,2776,2203,2309,2385,743,2958,2512,2485,2404,2675,2568,2555,217,2491,2566,2481,2503,2786,1051,2502,803,3030,1789,2,424,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0  27  2558  1112,1284,1174,3100,1049,2137,2273,2651,340,2163
279  2  3  15  2831  2210,1456,453,1293,3210,2235,2284,1095,1487,3511,738,886,1926,3501,1023,150,1198,3413,156,909,1019,2848,260,2737,1096,2684,1887,107,1143,347,1107,1111,1151,1133,3113,3592,1119,3287,1203,1181,1121,852,1915,1247,3038,240,0,0,0,0  46  2212  820,1009,2076,529,3032,2503,2742,2345,965,366
1300  2  1  11  3282  692,3041,1234,519,1554,1258,3452,1509,1170,1252,2804,754,2866,1987,2416,596,1250,1824,1225,2323,2542,2647,2355,2267,1248,2543,1818,2512,1815,1167,1289,1241,1803,2974,3252,3127,3320,3061,3278,3075,3249,3322,2945,3179,65,1109,3091,1245,2311,3357  165  1880  1545,332,2754,2254,267,1532,1062,1450,1440,2467
323  2  5  13  1799  580,864,1060,2098,2824,1203,1213,1088,1185,2,1925,309,2427,1994,1176,1486,853,1161,29,254,1259,528,1179,1107,1567,4,427,3567,3130,1174,2129,575,347,1415,2786,2204,2487,21,1223,3032,2652,67,2198,1737,45,51,218,2400,1225,467  117  1295  1114,2758,435,318,2251,2111,3650,2510,3705,1111
695  1  2  2  233  2161,2235,700,2962,444,2489,2375,1849,3662,3582,3650,3225,3128,3060,3127,3581,3252,3510,3556,3076,3281,3302,3050,3384,3702,2969,3303,3551,3543,3178,3249,3670,3342,3652,3665,3378,3322,3073,3376,3075,3584,3179,3504,3511,3278,1289,2,467,107,994  190  2945  2456,2716,2635,990,3657,3403,2210,1602,3251,143

数据说明:

第 1 列
user_id用户id
第 2 列gender用户性别
第 3 列age用户年龄
第 4 列occupation用户工作
第 5 列zip用户邮编
第 6 列hist_movie_id用户历史观看电影序列
第 7 列hist_len用户历史观看电影长度
第 8 列pos_movie_id用户下一步观看的电影(正样本)
第 9 列neg_movie_id用户下一步未观看的电影(抽样作为负样本)

数据加工逻辑见:https://github.com/wziji/deep_ctr/blob/master/mind/data.py

(2)MIND模型代码:

import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, concatenate, Flatten, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from CapsuleLayer import SequencePoolingLayer, LabelAwareAttention, CapsuleLayer

def tile_user_otherfeat(user_other_feature, k_max):
    return tf.tile(tf.expand_dims(user_other_feature, -2), [1, k_max, 1])


def mind(
    sparse_input_length=1,
    dense_input_length=1,
    sparse_seq_input_length=50,
    
    embedding_dim = 64,
    neg_sample_num = 10,
    user_hidden_unit_list = [128, 64],
    k_max = 5,
    p = 1,
    dynamic_k = True
    ):

    # 1. Input layer

    user_id_input_layer = Input(shape=(sparse_input_length, ), name="user_id_input_layer")
    gender_input_layer = Input(shape=(sparse_input_length, ), name="gender_input_layer")
    age_input_layer = Input(shape=(sparse_input_length, ), name="age_input_layer")
    occupation_input_layer = Input(shape=(sparse_input_length, ), name="occupation_input_layer")
    zip_input_layer = Input(shape=(sparse_input_length, ), name="zip_input_layer")

    user_click_item_seq_input_layer = Input(shape=(sparse_seq_input_length, ), name="user_click_item_seq_input_layer")
    user_click_item_seq_length_input_layer = Input(shape=(sparse_input_length, ), name="user_click_item_seq_length_input_layer")

    pos_item_sample_input_layer = Input(shape=(sparse_input_length, ), name="pos_item_sample_input_layer")
    neg_item_sample_input_layer = Input(shape=(neg_sample_num, ), name="neg_item_sample_input_layer")


    # 2. Embedding layer

    user_id_embedding_layer = Embedding(6040+1, embedding_dim, mask_zero=True, name='user_id_embedding_layer')(user_id_input_layer)
    gender_embedding_layer = Embedding(2+1, embedding_dim, mask_zero=True, name='gender_embedding_layer')(gender_input_layer)
    age_embedding_layer = Embedding(7+1, embedding_dim, mask_zero=True, name='age_embedding_layer')(age_input_layer)
    occupation_embedding_layer = Embedding(21+1, embedding_dim, mask_zero=True, name='occupation_embedding_layer')(occupation_input_layer)
    zip_embedding_layer = Embedding(3439+1, embedding_dim, mask_zero=True, name='zip_embedding_layer')(zip_input_layer)
    
    item_id_embedding_layer = Embedding(3706+1, embedding_dim, mask_zero=True, name='item_id_embedding_layer')
    pos_item_sample_embedding_layer = item_id_embedding_layer(pos_item_sample_input_layer)
    neg_item_sample_embedding_layer = item_id_embedding_layer(neg_item_sample_input_layer)
    
    user_click_item_seq_embedding_layer = item_id_embedding_layer(user_click_item_seq_input_layer)


    ### ********** ###
    # 3. user part
    ### ********** ###
    
    # 3.1 pooling layer

    user_click_item_seq_embedding_layer_pooling = SequencePoolingLayer()\
        ([user_click_item_seq_embedding_layer, user_click_item_seq_length_input_layer])
    
    print("user_click_item_seq_embedding_layer_pooling", user_click_item_seq_embedding_layer_pooling)
    
    
    # 3.2 capsule layer

    high_capsule = CapsuleLayer(input_units=embedding_dim,
            out_units=embedding_dim, max_len=sparse_seq_input_length,
            k_max=k_max)\
        ([user_click_item_seq_embedding_layer, user_click_item_seq_length_input_layer])
    
    print("high_capsule: ", high_capsule)
    

    # 3.3 Concat "sparse" embedding & "sparse_seq" embedding, and tile embedding

    other_user_embedding_layer = concatenate([user_id_embedding_layer, gender_embedding_layer, \
        age_embedding_layer, occupation_embedding_layer, \
        zip_embedding_layer, user_click_item_seq_embedding_layer_pooling], 
        axis=-1)


    other_user_embedding_layer = tf.tile(other_user_embedding_layer, [1, k_max, 1])

    print("other_user_embedding_layer: ", other_user_embedding_layer)


    # 3.4 user dnn part

    user_deep_input = concatenate([other_user_embedding_layer, high_capsule], axis=-1)
    print("user_deep_input: ", user_deep_input)

    
    for i, u in enumerate(user_hidden_unit_list):
        user_deep_input = Dense(u, activation="relu", name="FC_{0}".format(i+1))(user_deep_input)
        #user_deep_input = Dropout(0.3)(user_deep_input)

    print("user_deep_input: ", user_deep_input)
    

    if dynamic_k:
        user_embedding_final = LabelAwareAttention(k_max=k_max, pow_p=p, )(\
            [user_deep_input, pos_item_sample_embedding_layer, \
            user_click_item_seq_length_input_layer])
    else:
        user_embedding_final = LabelAwareAttention(k_max=k_max, pow_p=p, )(\
            [user_deep_input, pos_item_sample_embedding_layer])
    
    
    user_embedding_final = tf.expand_dims(user_embedding_final, 1)
    print("user_embedding_final: ", user_embedding_final)
    
    
    
    ### ********** ###
    # 4. item part
    ### ********** ###

    item_embedding_layer = concatenate([pos_item_sample_embedding_layer, \
        neg_item_sample_embedding_layer], \
        axis=1)
    
    item_embedding_layer = tf.transpose(item_embedding_layer, [0,2,1])
    
    print("item_embedding_layer: ", item_embedding_layer)


    ### ********** ###
    # 5. Output
    ### ********** ###
    
    dot_output = tf.matmul(user_embedding_final, item_embedding_layer)
    dot_output = tf.nn.softmax(dot_output) # 输出11个值,index为0的值是正样本,负样本的索引位置为[1-10]
    print(dot_output)
    
    user_inputs_list = [user_id_input_layer, gender_input_layer, age_input_layer, \
          occupation_input_layer, zip_input_layer, \
             user_click_item_seq_input_layer, user_click_item_seq_length_input_layer]
    
    item_inputs_list = [pos_item_sample_input_layer, neg_item_sample_input_layer]

    model = Model(inputs = user_inputs_list + item_inputs_list,
           outputs = dot_output)
    
    
    #print(model.summary())
    #tf.keras.utils.plot_model(model, to_file='MIND_model.png', show_shapes=True)

    model.__setattr__("user_input", user_inputs_list)
    model.__setattr__("user_embedding", user_deep_input)
    
    model.__setattr__("item_input", pos_item_sample_input_layer)
    model.__setattr__("item_embedding", pos_item_sample_embedding_layer)
    
    return model

(3)MIND模型结构图

输入:7个特征数据,和2组label数据(包含1个正样本数据,抽样的10个负样本数据);

输出:11个样本的 Softmax 概率分布;

(4)训练MIND模型

import tensorflow as tf
from mind import mind
from tensorflow.keras.optimizers import Adam


early_stopping_cb = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
callbacks = [early_stopping_cb]


model = mind()

model.compile(loss='sparse_categorical_crossentropy', \
    optimizer=Adam(lr=1e-3), \
    metrics=['sparse_categorical_accuracy'])

# loss="sparse_categorical_accuracy"的应用方式参见:https://mp.weixin.qq.com/s/H4ET0bO_xPm8TNqltMt3Fg


history = model.fit(train_generator, \
    epochs=2, \
    steps_per_epoch = steps_per_epoch, \
    callbacks = callbacks, 
    validation_data = val_generator, \
    validation_steps = validation_steps, \
    shuffle=True
    )


model.save_weights('mind_model.h5')

训练结果如下所示:

Train for 989 steps, validate for 7 steps
Epoch 1/2
989/989 [==============================] - 137s 139ms/step - loss: 1.6125 - sparse_categorical_accuracy: 0.4041 - val_loss: 1.5422 - val_sparse_categorical_accuracy: 0.4224
Epoch 2/2
989/989 [==============================] - 131s 133ms/step - loss: 1.3553 - sparse_categorical_accuracy: 0.4910 - val_loss: 1.4716 - val_sparse_categorical_accuracy: 0.4604

本文的代码请见,欢迎交流:https://github.com/wziji/deep_ctr/tree/master/mind

参考:

(1)https://github.com/shenweichen/DeepMatch/blob/master/deepmatch/models/mind.py

(2)https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py

欢迎关注 “python科技园” 及 添加小编 进群交流。

  • 8
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值