【代码解读】Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction

本文主要内容

  • 本文主要是对以下论文,作者提供的代码进行注释解读:
    《Pi, Q., Bian, W., Zhou, G., Zhu, X., & Gai, K. (2019, July). Practice on long sequential user behavior modeling for click-through rate prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2671-2679).》

  • 论文作者的代码地址:https://github.com/UIC-Paper/MIMN

  • 本文注释后的代码地址:https://gitee.com/ze_code/mimn_code_comment

How to Use

1. Prepare Data
sh prepare_amazon.sh
2. Run Base Model

  The example for DNN

python script/train_book.py -p train --random_seed 19 --model_type DNN
python script/train_book.py -p test --random_seed 19 --model_type DNN
The model below had been supported: 
- DNN 
- PNN 
- DIN
- GRU4REC
- ARNN
- RUM
- DIEN
- DIEN_with_neg
3. Run MIMN
  • MIMN Basic
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 0
  • MIMN with Memory Utilization Regularization
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 1
  • MIMN with Memory Utilization Regularization and Memory Induction Unit
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 1 --util_reg 1
  • MIMN with Auxiliary Loss
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN_with_neg --memory_size 4 --mem_induction 0 --util_reg 0

1. 数据预处理

1.1 process_data.py
  • 读取meta数据,每个样本处理为【‘asin’,‘categories’】(商品id、商品类型), 将结果保存到文件"item-info"
  • 读取评论数据,每个样本处理为【“reviewerID”,“asin”,“overall”,“unixReviewTime”】(用户id,商品id,评分,评论发表时间)将结果保存到"reviews-info"
  • 读取文件"reviews-info",生成正负样本,其格式为:【“0 or 1” + “\t” + (用户id,物品id,评分,发表时间) + “\t” + 物品类型】。样本的排列顺序是先存放完用户a再存放用户b,以此类推。每有一个正样本,都会生成一个负样本。将处理后的数据输出到文件"jointed-new",具体例子为:
    0,(用户1,物品6,评分,发表时间1,物品类型)
    1,(用户1,物品3,评分,发表时间2,物品类型)
    0,(用户1,物品2,评分,发表时间3,物品类型)
    1,(用户1,物品8,评分,发表时间4,物品类型)
    其中发表时间1~4是时间递增的顺序。
    
  • 划分数据集,对文件"jointed-new"每一行的样本,添加训练集标签(“20180118”)、测试集标签(“20190119”)。遍历每个用户的样本,当该用户的样本数小于等于2时,该用户没有训练样本只有测试样本。当该用户的样本n数大于2时,该用户前n-2个样本为训练样本,最后2个为测试样本。将处理的结果输出到文件"jointed-new-split-info"。
1.2 local_aggretor.py
根据文件"jointed-new-split-info"中的训练、测试标签,将样本分别写入训练集文件"local_train"和测试集文件"local_test"。
例如在文件"jointed-new-split-info"中某用户的样本为:
    格式:【训练/测试标签,正负样本标签,用户编号,物品编号,评分,发表时间,物品类型】
    20180118,0,(用户1,物品6,评分,发表时间1,物品类型a)
    20180118,1,(用户1,物品3,评分,发表时间2,物品类型b)
    20180118,0,(用户1,物品7,评分,发表时间3,物品类型c)
    20180118,1,(用户1,物品5,评分,发表时间4,物品类型d)
    20180118,0,(用户1,物品11,评分,发表时间5,物品类型e)
    20180118,1,(用户1,物品13,评分,发表时间6,物品类型f)
    20190119,0,(用户1,物品2,评分,发表时间7,物品类型g)
    20190119,1,(用户1,物品8,评分,发表时间8,物品类型h)
    20190119,1,(用户1,物品4,评分,发表时间9,物品类型i)
在每个样本后面添加序列信息,当前仅当当前样本有历史交互信息(正样本才算历史交互)时才输出。经过该代码处理后,最后出来的结果为:
    训练集(写入文件“local_train”):
        0,用户1,物品7,评分,发表时间3,物品类型c,(物品3),(类型b)
        1,用户1,物品5,评分,发表时间4,物品类型d,(物品3),(类型b)
        0,用户1,物品11,评分,发表时间5,物品类型e,(物品3,5),(类型b,d)
        1,用户1,物品13,评分,发表时间6,物品类型f,(物品3,5),(类型b,d)
    测试集(写入文件“local_test”):
        1,用户1,物品4,评分,发表时间6,物品类型f,(物品8),(类型h)
1.3 split_by_user.py
将文件"local_test"的样本,随机选取1/10的样本写入测试集文件"local_test_splitByUser",
随机选取9/10的样本写入训练集文件"local_train_splitByUser"
1.4 generate_voc.py
根据文件"local_train_splitByUser"的内容,对用户id、物品id、物品种类cat进行编号,按照它们出现次数进行降序编号,生成以下变量:
uid_voc: 字典, key=uid,value=编号
mid_dict: 字典, key=mid,value=编号
cat_dict: 字典, key=cat,value=编号

将结果分别保存到文件"uid_voc.pkl"、"mid_voc.pkl"、"cat_voc.pkl"
1.5 generate_voc.py
  • def generate_sample_list():
    功能:
    生成以下变量:
    train_sample_list是一个list,每个样本的格式为:
        用户编号、物品编号、物品类别编号、正负样本标签、历史交互过的物品编号序列、历史交互过的物品类别编号序列
    test_sample_list:
        与train_sample_list相同
    feature_total_num:
        记录用户id、物品id、物品种类id的数量之和
    
    其中编号的顺序为:
        最后一个user的编号+1就是第一个item的编号,最后一个item的编号+1就是第一个cat的编号
    
    该函数仅保留历史物品交互序列长度大于20的样本,并对历史交互序列进行max_len=100的截取或填充处理,
    即如果历史交互序列长度大于max_len,则只截取最后的max_len个交互。
    
  • def produce_neg_item_hist_with_cate(train_file, test_file):
    对(训练集或测试集)每个样本进行处理,在最后添加负采样历史交互序列,处理后的样本格式如下:
    用户编号、物品编号、物品类别编号、正负样本标签、历史交互过的物品编号序列、历史交互过的物品类别编号序列、非历史交互过的物品编号序列、非历史交互过的物品类别编号序列
    
    其中,从整个数据的物品中随机选出n个(n等于历史交互序列的长度)不在历史交互序列中的物品,作为非历史交互过的物品序列.
    

2. 模型定义

  1. 首先定义父类class Model(object),里面编写了各种模型通用的函数。
  2. 再定义继承父类Model的类class Model_MIMN(Model),用于定义模型MIMN的具体结构。
  3. 模型MIMN里面需要用到的自定义层编写在类class MIMNCell(tf.contrib.rnn.RNNCell)中。
  • class Model(object):
class Model(object):
    def __init__(self, 
                n_uid, # 用户id的数量
                n_mid, # 物品id的数量
                EMBEDDING_DIM, # 用户和物品id映射成embedding向量的维度
                HIDDEN_SIZE, # 模型内部向量的维度
                BATCH_SIZE, 
                SEQ_LEN, # 交互序列的长度
                use_negsample=False, # 是否使用负采样的历史交互序列信息
                Flag="DNN"):
        self.model_flag = Flag
        self.reg = False
        self.use_negsample= use_negsample
        with tf.name_scope('Inputs'):
            self.mid_his_batch_ph = tf.placeholder(tf.int32, [None, None], name='mid_his_batch_ph') # 物品历史交互序列,shape=[batch_size, SEQ_LEN]
            self.cate_his_batch_ph = tf.placeholder(tf.int32, [None, None], name='cate_his_batch_ph') # 物品类型历史交互序列,shape=[batch_size, SEQ_LEN]
            self.uid_batch_ph = tf.placeholder(tf.int32, [None, ], name='uid_batch_ph') # 用户id,shape=[batch_size,]
            self.mid_batch_ph = tf.placeholder(tf.int32, [None, ], name='mid_batch_ph') # 物品id, shape=[batch_size,]
            self.cate_batch_ph = tf.placeholder(tf.int32, [None, ], name='cate_batch_ph') # 物品类型(对应物品id),shape=[batch_size,]
            self.mask = tf.placeholder(tf.float32, [None, None], name='mask_batch_ph') # 物品历史交互序列的mask, shape=[batch_size, SEQ_LEN]。由于历史交互序列在预处理时,采用了补0或截断处理,因此需要mask来标记某个时刻是否真的有交互
            self.target_ph = tf.placeholder(tf.float32, [None, 2], name='target_ph') # 是否点击的label(每个样本的label为[y,1-y]), shape=[batch_size,2]
            self.lr = tf.placeholder(tf.float64, []) # 优化器adam的初始化学习率

        # Embedding layer
        with tf.name_scope('Embedding_layer'):
            # 用于embedding映射的矩阵
            self.mid_embeddings_var = tf.get_variable("mid_embedding_var", [n_mid, EMBEDDING_DIM], trainable=True)
            
            # 将物品历史交互序列进行embedding映射
            self.mid_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.mid_batch_ph) # shape=[None, embed_dim]
            self.mid_his_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.mid_his_batch_ph) # shape=[None,SEQ_LEN,EMBEDDING_DIM]

            # 将物品类型的历史交互序列进行embedding映射
            self.cate_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.cate_batch_ph) # shape=[None, embed_dim]
            self.cate_his_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.cate_his_batch_ph) # shape=[None,SEQ_LEN,EMBEDDING_DIM]


        with tf.name_scope('init_operation'):    
            self.mid_embedding_placeholder = tf.placeholder(tf.float32,[n_mid, EMBEDDING_DIM], name="mid_emb_ph")
            self.mid_embedding_init = self.mid_embeddings_var.assign(self.mid_embedding_placeholder)

        # 如果使用负采样的历史交互序列信息,则定义相应输入的placeholder和embedding映射层
        if self.use_negsample:
            self.mid_neg_batch_ph = tf.placeholder(tf.int32, [None, None], name='neg_his_batch_ph') # 负采样的物品历史交互序列,shape=[batch_size, SEQ_LEN]
            self.cate_neg_batch_ph = tf.placeholder(tf.int32, [None, None], name='neg_cate_his_batch_ph') # 负采样的物品类型历史交互序列,shape=[batch_size, SEQ_LEN]
            self.neg_item_his_eb = tf.nn.embedding_lookup(self.mid_embeddings_var, self.mid_neg_batch_ph) # 将负采样的物品历史交互序列进行embedding映射
            self.neg_cate_his_eb = tf.nn.embedding_lookup(self.mid_embeddings_var, self.cate_neg_batch_ph) # 将负采样的物品类型历史交互序列进行embedding映射
            self.neg_his_eb = tf.concat([self.neg_item_his_eb,self.neg_cate_his_eb], axis=2) * tf.reshape(self.mask,(BATCH_SIZE, SEQ_LEN, 1)) # 负采样物品历史交互序列的mask
            
        # 将物品id的embedding和对应物品类型的embedding进行拼接,作为物品的特征
        self.item_eb = tf.concat([self.mid_batch_embedded, self.cate_batch_embedded], axis=1) # shape=[None, 2*embed_dim]

        # 将历史交互序列的物品id的embedding和对应物品类型的embedding进行拼接,作为历史交互的序列特征
        self.item_his_eb = tf.concat([self.mid_his_batch_embedded,self.cate_his_batch_embedded], axis=2) * tf.reshape(self.mask,(BATCH_SIZE, SEQ_LEN, 1)) # shape=[None,SEQ_LEN,2*EMBEDDING_DIM]
        
        # 对历史交互的序列特征self.item_his_eb加工为简单的复合特征
        self.item_his_eb_sum = tf.reduce_sum(self.item_his_eb, 1) # shape=[None,2*EMBEDDING_DIM]
        
    def build_fcn_net(self, inp, use_dice = False):
    """
        参数:
            inp, shape=[batch_size, k],k可以是任意维度。 inp是每个样本经过模型提取后的特征
        功能:
            将处理后的样本特征inp送入全连接层,得到预测的y,然后计算loss
    """

    
    def auxiliary_loss(self, h_states, click_seq, noclick_seq, mask = None, stag = None):
    """
        参数:
            h_states, shape=[batch_size, SEQ_LEN-1, memory_vector_dim], 0~(t-1)时刻的MIMN.cell的output
            click_seq, shape=[batch_size, SEQ_LEN-1, memory_vector_dim], 1~t 时刻的历史交互embedding
            noclick_seq, shape=[batch_size, SEQ_LEN-1, memory_vector_dim], 1~t 时刻的历史交互负样本embedding
            mask, shape=[batch_size, SEQ_LEN-1], 1~t 时刻的历史交互embedding的标签,如果t'时刻有交互则为1,没有交互则为0
        功能:
            分别计算点击历史交互正样本click_seq的概率、点击历史交互负样本noclick_seq的概率(如果模型判断用户在t时刻点击了负样本,则在t时刻肯定就是没有点击正样本,所以
            点击负样本的概率就是没有点击正样本的概率)。最后模型计算了是否点击的二分类任务的loss。
    """
    
    
    def auxiliary_net(self, in_, stag='auxiliary_net'):
    """
        参数:
            in_, shape=[batch_size, SEQ_LEN-1, k],k可以是任意维度, SEQ_LEN是每个样本的历史交互序列的长度
            y_hat, shape=[batch_size, SEQ_LEN-1, 2], 二分类的概率
        功能:
            将in_,通过全连接层的映射,得出是否点击的二分类概率y_hat。
            其中全连接层的结构为 dense(100)->dense(50)->dense(2)
    """
  • class Model_MIMN(Model):
class Model_MIMN(Model):
        def __init__(self, 
        n_uid, 
        n_mid, 
        EMBEDDING_DIM, 
        HIDDEN_SIZE, 
        BATCH_SIZE, 
        MEMORY_SIZE, 
        SEQ_LEN=400, 
        Mem_Induction=0, 
        Util_Reg=0, 
        use_negsample=False, 
        mask_flag=False):
        
        super(Model_MIMN, self).__init__(
            n_uid, 
            n_mid, 
            EMBEDDING_DIM, 
            HIDDEN_SIZE, 
            BATCH_SIZE, 
            SEQ_LEN, 
            use_negsample, 
            Flag="MIMN")
            
        self.reg = Util_Reg

        """
        def clear_mask_state(state, begin_state, begin_channel_rnn_state, mask, cell, t):
            参数:
                state, mimn.MIMNCell处理完样本t时刻交互的embedding之后的变量state
                begin_state, mimn.MIMNCell的变量state的初始化值
                mask, 例如该数据集预处理后每个样本的历史交互序列的长度都为5(SEQ_LEN),现在有一个序列的只有3个历史交互,则进行补0填充处理后,该序列的mask为00111
                begin_channel_rnn_state,  mimn.MIMNCell的变量channel_rnn_state的初始化值
            功能:
                例如有2个样本序列,序列长度固定为5,一个只有2个历史交互,另一个只有3个历史交互,因此这个样本的历史交互序列的mask为:00011、00111
                当使用mimn.MIMNCell处理第t=0时刻时,两个样本的t=0时刻的mask都为0,因此mimn.MIMNCell的state都不更新
                当使用mimn.MIMNCell处理第t=2时刻时,两个样本的t=2时刻的mask分别为0和1,因此mimn.MIMNCell的state一个需要更新,而另一个样本的state不需更新
        """

        
        # 创建mimn.MIMNCell
        cell = mimn.MIMNCell(controller_units=HIDDEN_SIZE, memory_size=MEMORY_SIZE, memory_vector_dim=2*EMBEDDING_DIM,read_head_num=1, write_head_num=1,
            reuse=False, output_dim=HIDDEN_SIZE, clip_value=20, batch_size=BATCH_SIZE, mem_induction=Mem_Induction, util_reg=Util_Reg)

        # 获取mimn.MIMNCell的初始化state
        state = cell.zero_state(BATCH_SIZE, tf.float32)

        # 获取mimn.MIMNCell的初始化channel_rnn_output
        if Mem_Induction > 0:
            begin_channel_rnn_output = cell.channel_rnn_output # shape(cell.channel_rnn_output)=list, 每个元素的shape=[batch_size,self.memory_vector_dim],共有memory_size个单元。
        else:
            begin_channel_rnn_output = 0.0
        
        begin_state = state
        self.state_list = [state] # self.state_list的第一个元素为mimn.MIMNCell的初始化state
        self.mimn_o = [] 
        for t in range(SEQ_LEN): # 针对self.item_his_eb中的第t时刻的特征,使用mimn.MIMNCell进行处理
            output, state, temp_output_list = cell(self.item_his_eb[:, t, :], state) # shape(self.item_his_eb[:, t, :])=[None,2*EMBEDDING_DIM]
            if mask_flag:
                state = clear_mask_state(state, begin_state, begin_channel_rnn_output, self.mask, cell, t)
            self.mimn_o.append(output) # shape(mimn_o)=list,每个元素(第t时刻的输出)的shape=[batch_size,memory_vector_dim],每个元素共有SEQ_LEN个元素
            self.state_list.append(state) # shape(state_list)=list,每个元素都是一个state,共有SEQ_LEN个元素
                
        self.mimn_o = tf.stack(self.mimn_o, axis=1) # shape=[batch_size, SEQ_LEN, memory_vector_dim], 每个样本的历史交互序列经过处理后的output
        self.state_list.append(state)
        mean_memory = tf.reduce_mean(state['sum_aggre'], axis=-2) # shape=[batch_size, self.memory_vector_dim], 每个样本中self.memory_size个memory slot向量的均值

        before_aggre = state['w_aggre']
        read_out, _, _ = cell(self.item_eb, state)
        
        if use_negsample: # 计算结合了负样本信息的loss
            aux_loss_1 = self.auxiliary_loss(self.mimn_o[:, :-1, :], self.item_his_eb[:, 1:, :],
                                             self.neg_his_eb[:, 1:, :], self.mask[:, 1:], stag = "bigru_0") 
            self.aux_loss = aux_loss_1  

        if self.reg: # 参数正则化,equation(8) in the paper
            self.reg_loss = cell.capacity_loss(before_aggre)
        else:
            self.reg_loss = tf.zeros(1)

        if Mem_Induction == 1:
            channel_memory_tensor = tf.concat(temp_output_list, 1) # shape=[batch_size, self.memory_size, self.memory_vector_dim],每一行表示用户最后一个时刻的交互embedding经过self.memory_size个channel RNN处理后的output
            multi_channel_hist = din_attention(self.item_eb, channel_memory_tensor, HIDDEN_SIZE, None, stag='pal') # 使用attention机制对channel_memory_tensor的不同channel的memeory进行加权求和, shape=[batch_size, 1, self.memory_vector_dim]
            inp = tf.concat([self.item_eb, self.item_his_eb_sum, read_out, tf.squeeze(multi_channel_hist), mean_memory*self.item_eb], 1) # 拼接全部特征
        else:
            inp = tf.concat([self.item_eb, self.item_his_eb_sum, read_out, mean_memory*self.item_eb], 1) # 拼接全部特征

        # 使用样本的特征inp进行点击预测,并计算loss
        self.build_fcn_net(inp, use_dice=False) 
  • class MIMNCell(tf.contrib.rnn.RNNCell):
class MIMNCell(tf.contrib.rnn.RNNCell):
    def __init__(self, controller_units, memory_size,
            memory_vector_dim, read_head_num, write_head_num, 
            reuse=False, output_dim=None, clip_value=20, shift_range=1,
            batch_size=128, mem_induction=0, util_reg=0, sharp_value=2.):
            
        self.controller_units = controller_units
        self.memory_size = memory_size
        self.memory_vector_dim = memory_vector_dim
        self.read_head_num = read_head_num
        self.write_head_num = write_head_num
        self.mem_induction = mem_induction
        self.util_reg = util_reg
        self.reuse = reuse
        self.clip_value = clip_value
        self.sharp_value = sharp_value
        self.shift_range = shift_range
        self.batch_size = batch_size

        def single_cell(num_units): # 单次调用仅能实现将t时刻的input处理为t时刻的output
            return tf.nn.rnn_cell.GRUCell(num_units)

        if self.mem_induction > 0:
            self.channel_rnn = single_cell(self.memory_vector_dim)
            self.channel_rnn_state = [self.channel_rnn.zero_state(batch_size, tf.float32) for i in range(memory_size)] # shape(self.channel_rnn_state)=list, 每个元素的shape=[batch_size,self.memory_vector_dim],共有memory_size个单元。其中zero_state是RNN单元的初始state
            self.channel_rnn_output = [tf.zeros(((batch_size, self.memory_vector_dim))) for i in range(memory_size)]  # shape(self.channel_rnn_output)=list, 每个元素的shape=[batch_size,self.memory_vector_dim],共有memory_size个单元。

        self.controller = single_cell(self.controller_units) # controller_units 就是 HIDDEN_SIZE
        self.step = 0
        self.output_dim = output_dim

        self.o2p_initializer = create_linear_initializer(self.controller_units) # 长度为HIDDEN_SIZE的数组
        self.o2o_initializer = create_linear_initializer(self.controller_units + self.memory_vector_dim * self.read_head_num)
    
    '''
        x, shape(x)=[batch_size,2*EMBEDDING_DIM], 每行表示在一个batch中,某用户在t时刻的所交互物品的embedding
    '''
    def __call__(self, x, prev_state):
        # 使用GRU Cell,根据t时刻的input x 和 t-1时刻的 cell state(细胞状态 prev_state),计算t时刻的output和t时刻的cell state
        prev_read_vector_list = prev_state["read_vector_list"]
        '''
            shape(prev_read_vector_list)=list,每个单元的shape=【batch_size,self.memory_vector_dim], 共有单元数self.read_head_num。
            x, shape(x)=[batch_size,2*EMBEDDING_DIM], 每行表示在一个batch中,某用户在t时刻的所交互物品的embedding
            [x],将x转化为列表
            [x]+prev_read_vector_list表示将prev_read_vector_list中的元素加入到列表[x],既[x]+prev_read_vector_list=列表[x,prev_read_vector_list]
            其中self.memory_vector_dim=2*EMBEDDING_DIM
            tf.concat([x] + prev_read_vector_list, axis=1): 列表中每个单元的shape=[batch_size,2*EMBEDDING_DIM],拼接操作针对每个单元的第二维度进行拼接。所以结果shape(controller_input)=[batch_size,(1+self.read_head_num)*self.memory_vector_dim]
        '''
        controller_input = tf.concat([x] + prev_read_vector_list, axis=1) # shape(controller_input)=[batch_size,(1+self.read_head_num)*self.memory_vector_dim]
        with tf.variable_scope('controller', reuse=self.reuse):
            controller_output, controller_state = self.controller(controller_input, prev_state["controller_state"]) # 将controller_input使用GRU处理一次。 shape(controller_output)=[batch_size,self.controller_units], shape(controller_state)=[batch_size,self.controller_units]
        
        if self.util_reg:
            max_q = 400.0
            prev_w_aggre = prev_state["w_aggre"] / max_q  # shape=[batch_size, self.memory_size]
            controller_par = tf.concat([controller_output, tf.stop_gradient(prev_w_aggre)], axis=1) # shape=[batch_size, self.controller_units + self.memory_size]
        else:
            controller_par = controller_output # shape(controller_par)=[batch_size,self.controller_units]

        '''
        使用全连接层,将controller_par映射到模型需要训练的参数parameters(读memory key、写memory key等)
        '''
        num_parameters_per_head = self.memory_vector_dim + 1 + 1 + (self.shift_range * 2 + 1) + 1 
        num_heads = self.read_head_num + self.write_head_num
        total_parameter_num = num_parameters_per_head * num_heads + self.memory_vector_dim * 2 * self.write_head_num   
        with tf.variable_scope("o2p", reuse=(self.step > 0) or self.reuse):
            parameters = tf.contrib.layers.fully_connected(
                controller_par, 
                total_parameter_num, 
                activation_fn=None,
                weights_initializer=self.o2p_initializer) # shape(parameters)=[batch_size,total_parameter_num]
            parameters = tf.clip_by_value(parameters, -self.clip_value, self.clip_value) # 将parameters中的值限制在范围[-self.clip_value, self.clip_value]内

        '''
        将parameters分成两部分
        '''
        head_parameter_list = tf.split(parameters[:, :num_parameters_per_head * num_heads], num_heads, axis=1) # shape(head_parameter_list)=list,每个单元的shape=[batch_size,num_parameters_per_head],共有num_heads个单元
        erase_add_list = tf.split(parameters[:, num_parameters_per_head * num_heads:], 2 * self.write_head_num, axis=1) # shape(erase_add_list)=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有2*self.write_head_num个单元
            
        # prev_w_list = prev_state["w_list"]
        prev_M = prev_state["M"] # shape=[batch_size, self.memory_size, self.memory_vector_dim],每个batch对应每个用户样本,每个self.memory_size对应一个memory slot,每个memory slot使用一个长度为self.memory_vector_dim的向量进行表达
        key_M = prev_state["key_M"] # shape=[batch_size, self.memory_size, self.memory_vector_dim],key_M是
        w_list = []
        write_weight = []

        '''
        计算能将memory slot进行加权求和的权重w(分别对应read/write weight vector)
        '''
        for i, head_parameter in enumerate(head_parameter_list): # 遍历每个head的数据。 shape(head_parameter)=[batch_size,num_parameters_per_head]
            k = tf.tanh(head_parameter[:, 0:self.memory_vector_dim]) # k是读memory的key,或是写memory的key,对应论文中的公式(1)。 shape(k)=[batch_size,self.memory_vector_dim]
            beta = (tf.nn.softplus(head_parameter[:, self.memory_vector_dim]) + 1)*self.sharp_value # softplus(x)=ln(1+e^x), shape(beta)=[batch_size,],beta是一个向量       
            with tf.variable_scope('addressing_head_%d' % i):     
                w = self.addressing(k, beta, key_M, prev_M) # shape(w)=[batch_size,self.memory_size], read/write weight vector, corresponds to w^r_t/w^w_t in the paper
                if self.util_reg and i==1: # 重新修正w
                    s = tf.nn.softmax(
                        head_parameter[:, self.memory_vector_dim + 2:self.memory_vector_dim + 2 + (self.shift_range * 2 + 1)]
                    ) # shape(s)=[batch,(self.shift_range * 2 + 1)]
                    gamma = 2*(tf.nn.softplus(head_parameter[:, -1]) + 1)*self.sharp_value # shape(gamma)=[batch],向量
                    w = self.capacity_overflow(w, s, gamma) # shape(w)=[batch,self.memory_size]
                    write_weight.append(self.capacity_overflow(tf.stop_gradient(w), s, gamma))# shape(write_weight)=list,每个单元的shape=[batch_size,self.memory_size], 共有num_heads个单元
            w_list.append(w) # shape(w_list)=list,每个单元的shape=[batch_size,self.memory_size],共有num_heads个单元

        '''
        equation(3) in the paper, 根据权重(read_w_list[i]),对prev_M的memeory slot进行加权求和,计算得到read_vector r_t
        '''
        read_w_list = w_list[:self.read_head_num] # read memory weight vector w^r_t
        read_vector_list = []
        for i in range(self.read_head_num):
            '''
               shape(read_w_list[i])=[batch_size,self.memory_size]
               shape(prev_M)=[batch_size, self.memory_size, self.memory_vector_dim]
               shape(tf.expand_dims(read_w_list[i], dim=2) * prev_M)= [batch_size, self.memory_size, self.memory_vector_dim]
               shape(read_vector) = [batch_size, self.memory_vector_dim]
            '''
            read_vector = tf.reduce_sum(tf.expand_dims(read_w_list[i], dim=2) * prev_M, axis=1) # 根据权重(read_w_list[i]),对prev_M的memeory slot进行加权求和,计算得到read_vector(对应论文的r_t)
            read_vector_list.append(read_vector)

        # write memory weight vector w^w_t
        write_w_list = w_list[self.read_head_num:]
            
        '''
            针对不同用户样本,对应同一个memory slot所组成的序列,可以看成为一个channel。
            例如shape(prev_M)=[batch_size, self.memory_size, self.memory_vector_dim],每个batch对应每个用户样本,
            每个self.memory_size对应一个memory slot,每个memory slot使用一个长度为self.memory_vector_dim的向量进行表达。
            则第i个memory slot所组成的channel为:shape(prev_M[:,i])=[batch,self.memory_vector_dim]

            按照论文,该模型的self.read_head_num=self.read_head_num=1(仅有一个read weight vector w^r_t),
            因此向量read_w_list[0](read weight vector w^r_t)实际可以看成是对应self.memory_size个memeory channel的权重。
        '''
        channel_weight = read_w_list[0] # shape=[batch_size,self.memory_size]

        if self.mem_induction == 0:
            output_list = []

        elif self.mem_induction == 1:
            _, ind = tf.nn.top_k(channel_weight, k=1) # 返回channel_weight中每行最大的k个数的下标(找出每个用户最感兴趣的channel), shape(ind)=[batch_size,1]
            mask_weight = tf.reduce_sum(tf.one_hot(ind, depth=self.memory_size), axis=-2) # shape(mask_weight)=[batch_size,self.memory_size]。 shape(tf.one_hot(ind, depth=self.memory_size))=[batch_size,1,self.memory_size]
            output_list = []
            for i in range(self.memory_size):
                '''
                    shape(prev_M[:,i])=[batch,self.memory_vector_dim], 每行代表当前样本的上一个时刻(t-1时刻)的第i个channel的memory slot,每个memeory slot可以看作是用户的兴趣
                    shape(tf.expand_dims(mask_weight[:,i], axis=1))=[batch,1],每一行表示当前样本在当前第t时刻的第i个channel的memory slot是否保留,值为0或1
                    shape(tf.stop_gradient(prev_M[:,i]))=[batch_size, self.memory_vector_dim]
                    shape(x)=[batch_size,self.memory_vector_dim]
                    shape(tf.concat([x, tf.stop_gradient(prev_M[:,i]) * tf.expand_dims(mask_weight[:,i], axis=1)],axis=1))=[batch,2*self.memory_vector_dim],只有当前用户在第t-1时刻最感兴趣的memeory slot才参与第t时刻前向传播。任何被选中的t-1时刻的memory slot,都只参加前向传播,而不参加梯度更新。
                    shape(temp_output)=[batch_size, self.memory_vector_dim], 每一行表示当前样本的第i个channel的memory slot经过RNN处理后的output
                    shape(temp_new_state)=[batch_size, self.memory_vector_dim], 每一行表示当前样本的第i个channel的memory slot经过RNN处理后的RNN隐向量h
                '''
                temp_output, temp_new_state = self.channel_rnn(
                    tf.concat([x, tf.stop_gradient(prev_M[:,i]) * tf.expand_dims(mask_weight[:,i], axis=1)], axis=1), # 只有当前用户在第t-1时刻最感兴趣的memeory slot才参与第t时刻前向传播。任何被选中的t-1时刻的memory slot,都只参加前向传播,而不参加梯度更新。
                    self.channel_rnn_state[i]
                )
                
                '''
                    shape(self.channel_rnn_state[i])=[batch_size,self.memory_vector_dim], 每一行表示当前样本的第i个channel的memeory slot的RNN隐向量h
                    self.channel_rnn_state[i], 针对每一行值,存储用户第i个channel中,某个历史时刻最感兴趣的memeory slot,该memeory slot可能是在当前的第t时刻产生,也可能是之前的第t-n时刻产生
                '''
                self.channel_rnn_state[i] = temp_new_state * tf.expand_dims(mask_weight[:,i], axis=1) + self.channel_rnn_state[i]*(1- tf.expand_dims(mask_weight[:,i], axis=1))
                
                '''
                    shape(temp_output)=[batch_size, self.memory_vector_dim], 针对每一行的值,存储第i个channel的RNN处理后的output值,即:RNN(用户第i个memory在某个历史时刻t'最感兴趣的memeory slot与该时刻t'所输入x的融合信息)
                '''
                temp_output = temp_output * tf.expand_dims(mask_weight[:,i], axis=1) + self.channel_rnn_output[i]*(1- tf.expand_dims(mask_weight[:,i], axis=1))
                output_list.append(tf.expand_dims(temp_output,axis=1)) # shape(output_list)=list,每个单元的shape=[batch_size,1,self.memory_vector_dim],共有self.memory_size个单元

        M = prev_M
        sum_aggre = prev_state["sum_aggre"]

        '''
        Memeory Write, equation(4) in the paper
        '''
        for i in range(self.write_head_num): 
            w = tf.expand_dims(write_w_list[i], axis=2) # shape(w)=[batch_size,self.memory_size,1]
            erase_vector = tf.expand_dims(tf.sigmoid(erase_add_list[i * 2]), axis=1) # shape(erase_vector)=[batch_size,1,self.memory_vector_dim]
            add_vector = tf.expand_dims(tf.tanh(erase_add_list[i * 2 + 1]), axis=1) # shape(add_vector)=[batch_size,1,self.memory_vector_dim]
            M = M * (tf.ones(M.get_shape()) - tf.matmul(w, erase_vector)) + tf.matmul(w, add_vector)
            sum_aggre += tf.matmul(tf.stop_gradient(w), add_vector)

        w_aggre = prev_state["w_aggre"]
        if self.util_reg:
            w_aggre += tf.add_n(write_weight)
        else:
            w_aggre += tf.add_n(write_w_list)


        if not self.output_dim:
            output_dim = x.get_shape()[1] # output_dim = memory_vector_dim = 2*EMBEDDING_DIM
        else:
            output_dim = self.output_dim
        with tf.variable_scope("o2o", reuse=(self.step > 0) or self.reuse):
            read_output = tf.contrib.layers.fully_connected(
                tf.concat([controller_output] + read_vector_list, axis=1), # shape=[batch_size, self.controller_units + self.read_head_num*self.memory_size]
                output_dim, 
                activation_fn=None,
                weights_initializer=self.o2o_initializer)
            read_output = tf.clip_by_value(read_output, -self.clip_value, self.clip_value) # shape(read_output)=[batch_size,memory_vector_dim]

        self.step += 1
        return read_output, {
                "controller_state" : controller_state, # shape(controller_state)=[batch_size,self.controller_units]
                "read_vector_list" : read_vector_list, # shape(read_vector_list)=list, 每个单元的shape=[batch_size,self.memory_size],共有self.read_head_num个单元
                "w_list" : w_list,  # shape(w_list)=list,每个单元的shape=[batch_size,self.memory_size],共有num_heads个单元
                "M" : M,  # shape(M)=[batch_size, self.memory_size, self.memory_vector_dim]
                "key_M": key_M, # shape(key_M)=[batch_size, self.memory_size, self.memory_vector_dim]
                "w_aggre": w_aggre, # shape(w_aggre)=[batch_size, self.memory_size]
                "sum_aggre": sum_aggre # shape(sum_aggre)=[batch_size, self.memory_size, self.memory_vector_dim]
            }, output_list

    '''
    def addressing(self, k, beta, key_M, prev_M):
        shape(key_M)=shape(prev_M)=[batch_size, self.memory_size, self.memory_vector_dim]
        key_M, shape(key_M)=[batch_size, self.memory_size, self.memory_vector_dim],需要学习的参数,梯度更新仅与addressing函数有关
        prev_M, shape(key_M)=[batch_size, self.memory_size, self.memory_vector_dim],模型Nerual Turing Machine中需要学习的Memory,梯度更新与addressing函数、模型的其他模块都有关
        output: w_c, shape(w_c)=[batch_size,self.memory_size],表示为读memory的权重、或写memeory的权重
    '''
    def addressing(self, k, beta, key_M, prev_M):
        # Cosine Similarity
        def cosine_similarity(key, M):
            key = tf.expand_dims(key, axis=2) # shape=[batch_size,self.memory_vector_dim,1]
            inner_product = tf.matmul(M, key) # shape=[batch_size,self.memory_size,1]
            k_norm = tf.sqrt(tf.reduce_sum(tf.square(key), axis=1, keep_dims=True)) # shape=[batch_size,1,1]
            M_norm = tf.sqrt(tf.reduce_sum(tf.square(M), axis=2, keep_dims=True)) # shape=[batch_size,self.memory_size,1]
            norm_product = M_norm * k_norm # 广播乘法,对应元素相乘。 shape(norm_product)=[batch_size,self.memory_size,1]
            K = tf.squeeze(inner_product / (norm_product + 1e-8)) # 删除所有维度是1的维度。 shape(K)=[batch_size,self.memory_size]
            return K

        K = 0.5*(cosine_similarity(k,key_M) + cosine_similarity(k,prev_M)) # shape(K)=[batch_size,self.memory_size]
        K_amplified = tf.exp(tf.expand_dims(beta, axis=1) * K) # shape(tf.expand_dims(beta, axis=1))=[batch_size,1]。 shape(K_amplified)=[batch_size,self.memory_size]
        w_c = K_amplified / tf.reduce_sum(K_amplified, axis=1, keep_dims=True)  # shape(w_c)=[batch_size,self.memory_size]

        return w_c


    '''
        # shape(s)=[batch_size,(self.shift_range * 2 + 1)], 
        # shape(w_g)=[batch_size,self.memory_size], read/write weight vector, corresponds to w^r_t in the paper
        # shape(gamma)=[batch],向量
    '''
    def capacity_overflow(self, w_g, s, gamma):
        s = tf.concat(
                        [s[:, :self.shift_range + 1], # shape=[batch_size, self.shift_range+1]
                         tf.zeros([s.get_shape()[0], self.memory_size - (self.shift_range * 2 + 1)]), # shape=[batch_size, self.memory_size-(self.shift_range * 2 + 1)]
                         s[:, -self.shift_range:]], # shape=[batch_size, self.shift_range]
                        axis=1
                    ) # shape=[batch_size, self.memory_size]
        t = tf.concat([tf.reverse(s, axis=[1]), tf.reverse(s, axis=[1])], axis=1) # shape=[batch_size, 2*self.memory_size]
        s_matrix = tf.stack(
            [t[:, self.memory_size - i - 1:self.memory_size * 2 - i - 1] for i in range(self.memory_size)],
            axis=1
        ) # shape=[batch_size, self.memory_size, self.memory_size]
        w_ = tf.reduce_sum(tf.expand_dims(w_g, axis=1) * s_matrix, axis=2) # shape=[batch_size, self.memory_size]
        w_sharpen = tf.pow(w_, tf.expand_dims(gamma, axis=1)) # shape=[batch_size, self.memory_size]
        w = w_sharpen / tf.reduce_sum(w_sharpen, axis=1, keep_dims=True) # shape=[batch_size, self.memory_size]

        return w

    '''
    def capacity_loss(self, w_aggre):
        equation(8) in the paper
    '''
    def capacity_loss(self, w_aggre):
        loss = 0.001 * tf.reduce_mean((w_aggre - tf.reduce_mean(w_aggre, axis=-1, keep_dims=True))**2 / self.memory_size / self.batch_size)
        return loss

    '''
    def zero_state(self, batch_size, dtype):   
        功能:
            声明模型中需要训练的变量
    '''
    def zero_state(self, batch_size, dtype):
        with tf.variable_scope('init', reuse=self.reuse):
            read_vector_list = [expand(tf.tanh(learned_init(self.memory_vector_dim)), dim=0, N=batch_size) for i in range(self.read_head_num)] # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num个单元

            w_list = [expand(tf.nn.softmax(learned_init(self.memory_size)), dim=0, N=batch_size) for i in range(self.read_head_num + self.write_head_num)] # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num + self.write_head_num个单元

            controller_init_state = self.controller.zero_state(batch_size, dtype)  # RNN的初始h,shape=[batch_size,self.controller_units]         

            M = expand(
                    tf.tanh(tf.get_variable('init_M', [self.memory_size, self.memory_vector_dim], initializer=tf.random_normal_initializer(mean=0.0, stddev=1e-5), trainable=False)),
                    dim=0,
                    N=batch_size
                ) # shape=[batch_size, self.memory_size, self.memory_vector_dim]
            
            key_M = expand(
                    tf.tanh(tf.get_variable('key_M', [self.memory_size, self.memory_vector_dim], initializer=tf.random_normal_initializer(mean=0.0, stddev=0.5))),
                    dim=0,
                    N=batch_size
                ) # shape=[batch_size, self.memory_size, self.memory_vector_dim]
            
            sum_aggre = tf.constant(np.zeros([batch_size, self.memory_size, self.memory_vector_dim]), dtype=tf.float32)
            zero_vector = np.zeros([batch_size, self.memory_size])
            zero_weight_vector = tf.constant(zero_vector, dtype=tf.float32)

            state = {
                "controller_state" : controller_init_state, # RNN的初始h,shape=[batch_size,self.controller_units] 
                "read_vector_list" : read_vector_list, # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num个单元
                "w_list" : w_list, # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num + self.write_head_num个单元
                "M" : M, # shape=[batch_size, self.memory_size, self.memory_vector_dim]
                "w_aggre" : zero_weight_vector, # shape=[batch_size, self.memory_size]
                "key_M" : key_M, # shape=[batch_size, self.memory_size, self.memory_vector_dim]
                "sum_aggre" : sum_aggre # shape=[batch_size, self.memory_size, self.memory_vector_dim]
            }
            return state

3. 训练模型

train_book.py
def eval(sess, test_data, model, model_path, batch_size):
    参数:
        sess, tensorflow session
        test_data, 测试数据
        model_path, 模型的保存路径
        batch_size, 模型每次处理的样本数量
    功能:
        使用test_data测试模型,并计算模型的准确率accuracy、损失函数值loss、auc值,
        将拥有最佳auc值的模型保存到路径model_path


def train(
        train_file = "./data/book_data/book_train.txt",
        test_file = "./data/book_data/book_test.txt",
        feature_file = "./data/book_data/book_feature.pkl",
        batch_size = 128,
        maxlen = 100,
        test_iter = 50,
        save_iter = 100,
        model_type = 'DNN',
        Memory_Size = 4,
        Mem_Induction = 0, 
        Util_Reg = 0
):
    参数:
        test_iter,每训练test_iter轮,测试一次模型,如果该时刻的模型的auc值是全局最优,则保存该模型
        save_iter, 每训练save_iter轮,都保存一次模型
        model_type, 所选定的模型
        Memory_Size, 每个样本的原始特征所映射成的Memory channel数
        Mem_Induction,0/1 表示是否考虑多memory channel特征信息
        Util_Reg, 0/1 表示是否开启参数正则化功能
    功能:
        使用训练和测试数据,对所选定的模型进行训练和测试。
        每test_iter轮,输出一次训练结果和测试结果,并保存具有最佳auc值的测试模型。
        每训练save_iter轮,都保存一次模型


def test(
        train_file = "./data/book_data/book_train.txt",
        test_file = "./data/book_data/book_test.txt",
        feature_file = "./data/book_data/book_feature.pkl",
        batch_size = 128,
        maxlen = 100,
        test_iter = 100,
        save_iter = 100,
        model_type = 'DNN',
        Memory_Size = 4,
        Mem_Induction = 0, 
        Util_Reg = 0
):
    功能:
        从路径model_path中加载模型,并使用测试数据test_data计算模型的auc、loss、accuracy、aux_loss



  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值