本文主要内容
-
本文主要是对以下论文,作者提供的代码进行注释解读:
《Pi, Q., Bian, W., Zhou, G., Zhu, X., & Gai, K. (2019, July). Practice on long sequential user behavior modeling for click-through rate prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2671-2679).》 -
论文作者的代码地址:https://github.com/UIC-Paper/MIMN
-
本文注释后的代码地址:https://gitee.com/ze_code/mimn_code_comment
How to Use
1. Prepare Data
sh prepare_amazon.sh
2. Run Base Model
The example for DNN
python script/train_book.py -p train --random_seed 19 --model_type DNN
python script/train_book.py -p test --random_seed 19 --model_type DNN
The model below had been supported:
- DNN
- PNN
- DIN
- GRU4REC
- ARNN
- RUM
- DIEN
- DIEN_with_neg
3. Run MIMN
- MIMN Basic
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 0
- MIMN with Memory Utilization Regularization
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 1
- MIMN with Memory Utilization Regularization and Memory Induction Unit
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 1 --util_reg 1
- MIMN with Auxiliary Loss
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN_with_neg --memory_size 4 --mem_induction 0 --util_reg 0
1. 数据预处理
1.1 process_data.py
- 读取meta数据,每个样本处理为【‘asin’,‘categories’】(商品id、商品类型), 将结果保存到文件"item-info"
- 读取评论数据,每个样本处理为【“reviewerID”,“asin”,“overall”,“unixReviewTime”】(用户id,商品id,评分,评论发表时间)将结果保存到"reviews-info"
- 读取文件"reviews-info",生成正负样本,其格式为:【“0 or 1” + “\t” + (用户id,物品id,评分,发表时间) + “\t” + 物品类型】。样本的排列顺序是先存放完用户a再存放用户b,以此类推。每有一个正样本,都会生成一个负样本。将处理后的数据输出到文件"jointed-new",具体例子为:
0,(用户1,物品6,评分,发表时间1,物品类型) 1,(用户1,物品3,评分,发表时间2,物品类型) 0,(用户1,物品2,评分,发表时间3,物品类型) 1,(用户1,物品8,评分,发表时间4,物品类型) 其中发表时间1~4是时间递增的顺序。
- 划分数据集,对文件"jointed-new"每一行的样本,添加训练集标签(“20180118”)、测试集标签(“20190119”)。遍历每个用户的样本,当该用户的样本数小于等于2时,该用户没有训练样本只有测试样本。当该用户的样本n数大于2时,该用户前n-2个样本为训练样本,最后2个为测试样本。将处理的结果输出到文件"jointed-new-split-info"。
1.2 local_aggretor.py
根据文件"jointed-new-split-info"中的训练、测试标签,将样本分别写入训练集文件"local_train"和测试集文件"local_test"。
例如在文件"jointed-new-split-info"中某用户的样本为:
格式:【训练/测试标签,正负样本标签,用户编号,物品编号,评分,发表时间,物品类型】
20180118,0,(用户1,物品6,评分,发表时间1,物品类型a)
20180118,1,(用户1,物品3,评分,发表时间2,物品类型b)
20180118,0,(用户1,物品7,评分,发表时间3,物品类型c)
20180118,1,(用户1,物品5,评分,发表时间4,物品类型d)
20180118,0,(用户1,物品11,评分,发表时间5,物品类型e)
20180118,1,(用户1,物品13,评分,发表时间6,物品类型f)
20190119,0,(用户1,物品2,评分,发表时间7,物品类型g)
20190119,1,(用户1,物品8,评分,发表时间8,物品类型h)
20190119,1,(用户1,物品4,评分,发表时间9,物品类型i)
在每个样本后面添加序列信息,当前仅当当前样本有历史交互信息(正样本才算历史交互)时才输出。经过该代码处理后,最后出来的结果为:
训练集(写入文件“local_train”):
0,用户1,物品7,评分,发表时间3,物品类型c,(物品3),(类型b)
1,用户1,物品5,评分,发表时间4,物品类型d,(物品3),(类型b)
0,用户1,物品11,评分,发表时间5,物品类型e,(物品3,5),(类型b,d)
1,用户1,物品13,评分,发表时间6,物品类型f,(物品3,5),(类型b,d)
测试集(写入文件“local_test”):
1,用户1,物品4,评分,发表时间6,物品类型f,(物品8),(类型h)
1.3 split_by_user.py
将文件"local_test"的样本,随机选取1/10的样本写入测试集文件"local_test_splitByUser",
随机选取9/10的样本写入训练集文件"local_train_splitByUser"
1.4 generate_voc.py
根据文件"local_train_splitByUser"的内容,对用户id、物品id、物品种类cat进行编号,按照它们出现次数进行降序编号,生成以下变量:
uid_voc: 字典, key=uid,value=编号
mid_dict: 字典, key=mid,value=编号
cat_dict: 字典, key=cat,value=编号
将结果分别保存到文件"uid_voc.pkl"、"mid_voc.pkl"、"cat_voc.pkl"
1.5 generate_voc.py
- def generate_sample_list():
功能: 生成以下变量: train_sample_list是一个list,每个样本的格式为: 用户编号、物品编号、物品类别编号、正负样本标签、历史交互过的物品编号序列、历史交互过的物品类别编号序列 test_sample_list: 与train_sample_list相同 feature_total_num: 记录用户id、物品id、物品种类id的数量之和 其中编号的顺序为: 最后一个user的编号+1就是第一个item的编号,最后一个item的编号+1就是第一个cat的编号 该函数仅保留历史物品交互序列长度大于20的样本,并对历史交互序列进行max_len=100的截取或填充处理, 即如果历史交互序列长度大于max_len,则只截取最后的max_len个交互。
- def produce_neg_item_hist_with_cate(train_file, test_file):
对(训练集或测试集)每个样本进行处理,在最后添加负采样历史交互序列,处理后的样本格式如下: 用户编号、物品编号、物品类别编号、正负样本标签、历史交互过的物品编号序列、历史交互过的物品类别编号序列、非历史交互过的物品编号序列、非历史交互过的物品类别编号序列 其中,从整个数据的物品中随机选出n个(n等于历史交互序列的长度)不在历史交互序列中的物品,作为非历史交互过的物品序列.
2. 模型定义
- 首先定义父类
class Model(object)
,里面编写了各种模型通用的函数。 - 再定义继承父类Model的类
class Model_MIMN(Model)
,用于定义模型MIMN的具体结构。 - 模型MIMN里面需要用到的自定义层编写在类
class MIMNCell(tf.contrib.rnn.RNNCell)
中。
- class Model(object):
class Model(object):
def __init__(self,
n_uid, # 用户id的数量
n_mid, # 物品id的数量
EMBEDDING_DIM, # 用户和物品id映射成embedding向量的维度
HIDDEN_SIZE, # 模型内部向量的维度
BATCH_SIZE,
SEQ_LEN, # 交互序列的长度
use_negsample=False, # 是否使用负采样的历史交互序列信息
Flag="DNN"):
self.model_flag = Flag
self.reg = False
self.use_negsample= use_negsample
with tf.name_scope('Inputs'):
self.mid_his_batch_ph = tf.placeholder(tf.int32, [None, None], name='mid_his_batch_ph') # 物品历史交互序列,shape=[batch_size, SEQ_LEN]
self.cate_his_batch_ph = tf.placeholder(tf.int32, [None, None], name='cate_his_batch_ph') # 物品类型历史交互序列,shape=[batch_size, SEQ_LEN]
self.uid_batch_ph = tf.placeholder(tf.int32, [None, ], name='uid_batch_ph') # 用户id,shape=[batch_size,]
self.mid_batch_ph = tf.placeholder(tf.int32, [None, ], name='mid_batch_ph') # 物品id, shape=[batch_size,]
self.cate_batch_ph = tf.placeholder(tf.int32, [None, ], name='cate_batch_ph') # 物品类型(对应物品id),shape=[batch_size,]
self.mask = tf.placeholder(tf.float32, [None, None], name='mask_batch_ph') # 物品历史交互序列的mask, shape=[batch_size, SEQ_LEN]。由于历史交互序列在预处理时,采用了补0或截断处理,因此需要mask来标记某个时刻是否真的有交互
self.target_ph = tf.placeholder(tf.float32, [None, 2], name='target_ph') # 是否点击的label(每个样本的label为[y,1-y]), shape=[batch_size,2]
self.lr = tf.placeholder(tf.float64, []) # 优化器adam的初始化学习率
# Embedding layer
with tf.name_scope('Embedding_layer'):
# 用于embedding映射的矩阵
self.mid_embeddings_var = tf.get_variable("mid_embedding_var", [n_mid, EMBEDDING_DIM], trainable=True)
# 将物品历史交互序列进行embedding映射
self.mid_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.mid_batch_ph) # shape=[None, embed_dim]
self.mid_his_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.mid_his_batch_ph) # shape=[None,SEQ_LEN,EMBEDDING_DIM]
# 将物品类型的历史交互序列进行embedding映射
self.cate_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.cate_batch_ph) # shape=[None, embed_dim]
self.cate_his_batch_embedded = tf.nn.embedding_lookup(self.mid_embeddings_var, self.cate_his_batch_ph) # shape=[None,SEQ_LEN,EMBEDDING_DIM]
with tf.name_scope('init_operation'):
self.mid_embedding_placeholder = tf.placeholder(tf.float32,[n_mid, EMBEDDING_DIM], name="mid_emb_ph")
self.mid_embedding_init = self.mid_embeddings_var.assign(self.mid_embedding_placeholder)
# 如果使用负采样的历史交互序列信息,则定义相应输入的placeholder和embedding映射层
if self.use_negsample:
self.mid_neg_batch_ph = tf.placeholder(tf.int32, [None, None], name='neg_his_batch_ph') # 负采样的物品历史交互序列,shape=[batch_size, SEQ_LEN]
self.cate_neg_batch_ph = tf.placeholder(tf.int32, [None, None], name='neg_cate_his_batch_ph') # 负采样的物品类型历史交互序列,shape=[batch_size, SEQ_LEN]
self.neg_item_his_eb = tf.nn.embedding_lookup(self.mid_embeddings_var, self.mid_neg_batch_ph) # 将负采样的物品历史交互序列进行embedding映射
self.neg_cate_his_eb = tf.nn.embedding_lookup(self.mid_embeddings_var, self.cate_neg_batch_ph) # 将负采样的物品类型历史交互序列进行embedding映射
self.neg_his_eb = tf.concat([self.neg_item_his_eb,self.neg_cate_his_eb], axis=2) * tf.reshape(self.mask,(BATCH_SIZE, SEQ_LEN, 1)) # 负采样物品历史交互序列的mask
# 将物品id的embedding和对应物品类型的embedding进行拼接,作为物品的特征
self.item_eb = tf.concat([self.mid_batch_embedded, self.cate_batch_embedded], axis=1) # shape=[None, 2*embed_dim]
# 将历史交互序列的物品id的embedding和对应物品类型的embedding进行拼接,作为历史交互的序列特征
self.item_his_eb = tf.concat([self.mid_his_batch_embedded,self.cate_his_batch_embedded], axis=2) * tf.reshape(self.mask,(BATCH_SIZE, SEQ_LEN, 1)) # shape=[None,SEQ_LEN,2*EMBEDDING_DIM]
# 对历史交互的序列特征self.item_his_eb加工为简单的复合特征
self.item_his_eb_sum = tf.reduce_sum(self.item_his_eb, 1) # shape=[None,2*EMBEDDING_DIM]
def build_fcn_net(self, inp, use_dice = False):
"""
参数:
inp, shape=[batch_size, k],k可以是任意维度。 inp是每个样本经过模型提取后的特征
功能:
将处理后的样本特征inp送入全连接层,得到预测的y,然后计算loss
"""
def auxiliary_loss(self, h_states, click_seq, noclick_seq, mask = None, stag = None):
"""
参数:
h_states, shape=[batch_size, SEQ_LEN-1, memory_vector_dim], 0~(t-1)时刻的MIMN.cell的output
click_seq, shape=[batch_size, SEQ_LEN-1, memory_vector_dim], 1~t 时刻的历史交互embedding
noclick_seq, shape=[batch_size, SEQ_LEN-1, memory_vector_dim], 1~t 时刻的历史交互负样本embedding
mask, shape=[batch_size, SEQ_LEN-1], 1~t 时刻的历史交互embedding的标签,如果t'时刻有交互则为1,没有交互则为0
功能:
分别计算点击历史交互正样本click_seq的概率、点击历史交互负样本noclick_seq的概率(如果模型判断用户在t时刻点击了负样本,则在t时刻肯定就是没有点击正样本,所以
点击负样本的概率就是没有点击正样本的概率)。最后模型计算了是否点击的二分类任务的loss。
"""
def auxiliary_net(self, in_, stag='auxiliary_net'):
"""
参数:
in_, shape=[batch_size, SEQ_LEN-1, k],k可以是任意维度, SEQ_LEN是每个样本的历史交互序列的长度
y_hat, shape=[batch_size, SEQ_LEN-1, 2], 二分类的概率
功能:
将in_,通过全连接层的映射,得出是否点击的二分类概率y_hat。
其中全连接层的结构为 dense(100)->dense(50)->dense(2)
"""
- class Model_MIMN(Model):
class Model_MIMN(Model):
def __init__(self,
n_uid,
n_mid,
EMBEDDING_DIM,
HIDDEN_SIZE,
BATCH_SIZE,
MEMORY_SIZE,
SEQ_LEN=400,
Mem_Induction=0,
Util_Reg=0,
use_negsample=False,
mask_flag=False):
super(Model_MIMN, self).__init__(
n_uid,
n_mid,
EMBEDDING_DIM,
HIDDEN_SIZE,
BATCH_SIZE,
SEQ_LEN,
use_negsample,
Flag="MIMN")
self.reg = Util_Reg
"""
def clear_mask_state(state, begin_state, begin_channel_rnn_state, mask, cell, t):
参数:
state, mimn.MIMNCell处理完样本t时刻交互的embedding之后的变量state
begin_state, mimn.MIMNCell的变量state的初始化值
mask, 例如该数据集预处理后每个样本的历史交互序列的长度都为5(SEQ_LEN),现在有一个序列的只有3个历史交互,则进行补0填充处理后,该序列的mask为00111
begin_channel_rnn_state, mimn.MIMNCell的变量channel_rnn_state的初始化值
功能:
例如有2个样本序列,序列长度固定为5,一个只有2个历史交互,另一个只有3个历史交互,因此这个样本的历史交互序列的mask为:00011、00111
当使用mimn.MIMNCell处理第t=0时刻时,两个样本的t=0时刻的mask都为0,因此mimn.MIMNCell的state都不更新
当使用mimn.MIMNCell处理第t=2时刻时,两个样本的t=2时刻的mask分别为0和1,因此mimn.MIMNCell的state一个需要更新,而另一个样本的state不需更新
"""
# 创建mimn.MIMNCell
cell = mimn.MIMNCell(controller_units=HIDDEN_SIZE, memory_size=MEMORY_SIZE, memory_vector_dim=2*EMBEDDING_DIM,read_head_num=1, write_head_num=1,
reuse=False, output_dim=HIDDEN_SIZE, clip_value=20, batch_size=BATCH_SIZE, mem_induction=Mem_Induction, util_reg=Util_Reg)
# 获取mimn.MIMNCell的初始化state
state = cell.zero_state(BATCH_SIZE, tf.float32)
# 获取mimn.MIMNCell的初始化channel_rnn_output
if Mem_Induction > 0:
begin_channel_rnn_output = cell.channel_rnn_output # shape(cell.channel_rnn_output)=list, 每个元素的shape=[batch_size,self.memory_vector_dim],共有memory_size个单元。
else:
begin_channel_rnn_output = 0.0
begin_state = state
self.state_list = [state] # self.state_list的第一个元素为mimn.MIMNCell的初始化state
self.mimn_o = []
for t in range(SEQ_LEN): # 针对self.item_his_eb中的第t时刻的特征,使用mimn.MIMNCell进行处理
output, state, temp_output_list = cell(self.item_his_eb[:, t, :], state) # shape(self.item_his_eb[:, t, :])=[None,2*EMBEDDING_DIM]
if mask_flag:
state = clear_mask_state(state, begin_state, begin_channel_rnn_output, self.mask, cell, t)
self.mimn_o.append(output) # shape(mimn_o)=list,每个元素(第t时刻的输出)的shape=[batch_size,memory_vector_dim],每个元素共有SEQ_LEN个元素
self.state_list.append(state) # shape(state_list)=list,每个元素都是一个state,共有SEQ_LEN个元素
self.mimn_o = tf.stack(self.mimn_o, axis=1) # shape=[batch_size, SEQ_LEN, memory_vector_dim], 每个样本的历史交互序列经过处理后的output
self.state_list.append(state)
mean_memory = tf.reduce_mean(state['sum_aggre'], axis=-2) # shape=[batch_size, self.memory_vector_dim], 每个样本中self.memory_size个memory slot向量的均值
before_aggre = state['w_aggre']
read_out, _, _ = cell(self.item_eb, state)
if use_negsample: # 计算结合了负样本信息的loss
aux_loss_1 = self.auxiliary_loss(self.mimn_o[:, :-1, :], self.item_his_eb[:, 1:, :],
self.neg_his_eb[:, 1:, :], self.mask[:, 1:], stag = "bigru_0")
self.aux_loss = aux_loss_1
if self.reg: # 参数正则化,equation(8) in the paper
self.reg_loss = cell.capacity_loss(before_aggre)
else:
self.reg_loss = tf.zeros(1)
if Mem_Induction == 1:
channel_memory_tensor = tf.concat(temp_output_list, 1) # shape=[batch_size, self.memory_size, self.memory_vector_dim],每一行表示用户最后一个时刻的交互embedding经过self.memory_size个channel RNN处理后的output
multi_channel_hist = din_attention(self.item_eb, channel_memory_tensor, HIDDEN_SIZE, None, stag='pal') # 使用attention机制对channel_memory_tensor的不同channel的memeory进行加权求和, shape=[batch_size, 1, self.memory_vector_dim]
inp = tf.concat([self.item_eb, self.item_his_eb_sum, read_out, tf.squeeze(multi_channel_hist), mean_memory*self.item_eb], 1) # 拼接全部特征
else:
inp = tf.concat([self.item_eb, self.item_his_eb_sum, read_out, mean_memory*self.item_eb], 1) # 拼接全部特征
# 使用样本的特征inp进行点击预测,并计算loss
self.build_fcn_net(inp, use_dice=False)
- class MIMNCell(tf.contrib.rnn.RNNCell):
class MIMNCell(tf.contrib.rnn.RNNCell):
def __init__(self, controller_units, memory_size,
memory_vector_dim, read_head_num, write_head_num,
reuse=False, output_dim=None, clip_value=20, shift_range=1,
batch_size=128, mem_induction=0, util_reg=0, sharp_value=2.):
self.controller_units = controller_units
self.memory_size = memory_size
self.memory_vector_dim = memory_vector_dim
self.read_head_num = read_head_num
self.write_head_num = write_head_num
self.mem_induction = mem_induction
self.util_reg = util_reg
self.reuse = reuse
self.clip_value = clip_value
self.sharp_value = sharp_value
self.shift_range = shift_range
self.batch_size = batch_size
def single_cell(num_units): # 单次调用仅能实现将t时刻的input处理为t时刻的output
return tf.nn.rnn_cell.GRUCell(num_units)
if self.mem_induction > 0:
self.channel_rnn = single_cell(self.memory_vector_dim)
self.channel_rnn_state = [self.channel_rnn.zero_state(batch_size, tf.float32) for i in range(memory_size)] # shape(self.channel_rnn_state)=list, 每个元素的shape=[batch_size,self.memory_vector_dim],共有memory_size个单元。其中zero_state是RNN单元的初始state
self.channel_rnn_output = [tf.zeros(((batch_size, self.memory_vector_dim))) for i in range(memory_size)] # shape(self.channel_rnn_output)=list, 每个元素的shape=[batch_size,self.memory_vector_dim],共有memory_size个单元。
self.controller = single_cell(self.controller_units) # controller_units 就是 HIDDEN_SIZE
self.step = 0
self.output_dim = output_dim
self.o2p_initializer = create_linear_initializer(self.controller_units) # 长度为HIDDEN_SIZE的数组
self.o2o_initializer = create_linear_initializer(self.controller_units + self.memory_vector_dim * self.read_head_num)
'''
x, shape(x)=[batch_size,2*EMBEDDING_DIM], 每行表示在一个batch中,某用户在t时刻的所交互物品的embedding
'''
def __call__(self, x, prev_state):
# 使用GRU Cell,根据t时刻的input x 和 t-1时刻的 cell state(细胞状态 prev_state),计算t时刻的output和t时刻的cell state
prev_read_vector_list = prev_state["read_vector_list"]
'''
shape(prev_read_vector_list)=list,每个单元的shape=【batch_size,self.memory_vector_dim], 共有单元数self.read_head_num。
x, shape(x)=[batch_size,2*EMBEDDING_DIM], 每行表示在一个batch中,某用户在t时刻的所交互物品的embedding
[x],将x转化为列表
[x]+prev_read_vector_list表示将prev_read_vector_list中的元素加入到列表[x],既[x]+prev_read_vector_list=列表[x,prev_read_vector_list]
其中self.memory_vector_dim=2*EMBEDDING_DIM
tf.concat([x] + prev_read_vector_list, axis=1): 列表中每个单元的shape=[batch_size,2*EMBEDDING_DIM],拼接操作针对每个单元的第二维度进行拼接。所以结果shape(controller_input)=[batch_size,(1+self.read_head_num)*self.memory_vector_dim]
'''
controller_input = tf.concat([x] + prev_read_vector_list, axis=1) # shape(controller_input)=[batch_size,(1+self.read_head_num)*self.memory_vector_dim]
with tf.variable_scope('controller', reuse=self.reuse):
controller_output, controller_state = self.controller(controller_input, prev_state["controller_state"]) # 将controller_input使用GRU处理一次。 shape(controller_output)=[batch_size,self.controller_units], shape(controller_state)=[batch_size,self.controller_units]
if self.util_reg:
max_q = 400.0
prev_w_aggre = prev_state["w_aggre"] / max_q # shape=[batch_size, self.memory_size]
controller_par = tf.concat([controller_output, tf.stop_gradient(prev_w_aggre)], axis=1) # shape=[batch_size, self.controller_units + self.memory_size]
else:
controller_par = controller_output # shape(controller_par)=[batch_size,self.controller_units]
'''
使用全连接层,将controller_par映射到模型需要训练的参数parameters(读memory key、写memory key等)
'''
num_parameters_per_head = self.memory_vector_dim + 1 + 1 + (self.shift_range * 2 + 1) + 1
num_heads = self.read_head_num + self.write_head_num
total_parameter_num = num_parameters_per_head * num_heads + self.memory_vector_dim * 2 * self.write_head_num
with tf.variable_scope("o2p", reuse=(self.step > 0) or self.reuse):
parameters = tf.contrib.layers.fully_connected(
controller_par,
total_parameter_num,
activation_fn=None,
weights_initializer=self.o2p_initializer) # shape(parameters)=[batch_size,total_parameter_num]
parameters = tf.clip_by_value(parameters, -self.clip_value, self.clip_value) # 将parameters中的值限制在范围[-self.clip_value, self.clip_value]内
'''
将parameters分成两部分
'''
head_parameter_list = tf.split(parameters[:, :num_parameters_per_head * num_heads], num_heads, axis=1) # shape(head_parameter_list)=list,每个单元的shape=[batch_size,num_parameters_per_head],共有num_heads个单元
erase_add_list = tf.split(parameters[:, num_parameters_per_head * num_heads:], 2 * self.write_head_num, axis=1) # shape(erase_add_list)=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有2*self.write_head_num个单元
# prev_w_list = prev_state["w_list"]
prev_M = prev_state["M"] # shape=[batch_size, self.memory_size, self.memory_vector_dim],每个batch对应每个用户样本,每个self.memory_size对应一个memory slot,每个memory slot使用一个长度为self.memory_vector_dim的向量进行表达
key_M = prev_state["key_M"] # shape=[batch_size, self.memory_size, self.memory_vector_dim],key_M是
w_list = []
write_weight = []
'''
计算能将memory slot进行加权求和的权重w(分别对应read/write weight vector)
'''
for i, head_parameter in enumerate(head_parameter_list): # 遍历每个head的数据。 shape(head_parameter)=[batch_size,num_parameters_per_head]
k = tf.tanh(head_parameter[:, 0:self.memory_vector_dim]) # k是读memory的key,或是写memory的key,对应论文中的公式(1)。 shape(k)=[batch_size,self.memory_vector_dim]
beta = (tf.nn.softplus(head_parameter[:, self.memory_vector_dim]) + 1)*self.sharp_value # softplus(x)=ln(1+e^x), shape(beta)=[batch_size,],beta是一个向量
with tf.variable_scope('addressing_head_%d' % i):
w = self.addressing(k, beta, key_M, prev_M) # shape(w)=[batch_size,self.memory_size], read/write weight vector, corresponds to w^r_t/w^w_t in the paper
if self.util_reg and i==1: # 重新修正w
s = tf.nn.softmax(
head_parameter[:, self.memory_vector_dim + 2:self.memory_vector_dim + 2 + (self.shift_range * 2 + 1)]
) # shape(s)=[batch,(self.shift_range * 2 + 1)]
gamma = 2*(tf.nn.softplus(head_parameter[:, -1]) + 1)*self.sharp_value # shape(gamma)=[batch],向量
w = self.capacity_overflow(w, s, gamma) # shape(w)=[batch,self.memory_size]
write_weight.append(self.capacity_overflow(tf.stop_gradient(w), s, gamma))# shape(write_weight)=list,每个单元的shape=[batch_size,self.memory_size], 共有num_heads个单元
w_list.append(w) # shape(w_list)=list,每个单元的shape=[batch_size,self.memory_size],共有num_heads个单元
'''
equation(3) in the paper, 根据权重(read_w_list[i]),对prev_M的memeory slot进行加权求和,计算得到read_vector r_t
'''
read_w_list = w_list[:self.read_head_num] # read memory weight vector w^r_t
read_vector_list = []
for i in range(self.read_head_num):
'''
shape(read_w_list[i])=[batch_size,self.memory_size]
shape(prev_M)=[batch_size, self.memory_size, self.memory_vector_dim]
shape(tf.expand_dims(read_w_list[i], dim=2) * prev_M)= [batch_size, self.memory_size, self.memory_vector_dim]
shape(read_vector) = [batch_size, self.memory_vector_dim]
'''
read_vector = tf.reduce_sum(tf.expand_dims(read_w_list[i], dim=2) * prev_M, axis=1) # 根据权重(read_w_list[i]),对prev_M的memeory slot进行加权求和,计算得到read_vector(对应论文的r_t)
read_vector_list.append(read_vector)
# write memory weight vector w^w_t
write_w_list = w_list[self.read_head_num:]
'''
针对不同用户样本,对应同一个memory slot所组成的序列,可以看成为一个channel。
例如shape(prev_M)=[batch_size, self.memory_size, self.memory_vector_dim],每个batch对应每个用户样本,
每个self.memory_size对应一个memory slot,每个memory slot使用一个长度为self.memory_vector_dim的向量进行表达。
则第i个memory slot所组成的channel为:shape(prev_M[:,i])=[batch,self.memory_vector_dim]
按照论文,该模型的self.read_head_num=self.read_head_num=1(仅有一个read weight vector w^r_t),
因此向量read_w_list[0](read weight vector w^r_t)实际可以看成是对应self.memory_size个memeory channel的权重。
'''
channel_weight = read_w_list[0] # shape=[batch_size,self.memory_size]
if self.mem_induction == 0:
output_list = []
elif self.mem_induction == 1:
_, ind = tf.nn.top_k(channel_weight, k=1) # 返回channel_weight中每行最大的k个数的下标(找出每个用户最感兴趣的channel), shape(ind)=[batch_size,1]
mask_weight = tf.reduce_sum(tf.one_hot(ind, depth=self.memory_size), axis=-2) # shape(mask_weight)=[batch_size,self.memory_size]。 shape(tf.one_hot(ind, depth=self.memory_size))=[batch_size,1,self.memory_size]
output_list = []
for i in range(self.memory_size):
'''
shape(prev_M[:,i])=[batch,self.memory_vector_dim], 每行代表当前样本的上一个时刻(t-1时刻)的第i个channel的memory slot,每个memeory slot可以看作是用户的兴趣
shape(tf.expand_dims(mask_weight[:,i], axis=1))=[batch,1],每一行表示当前样本在当前第t时刻的第i个channel的memory slot是否保留,值为0或1
shape(tf.stop_gradient(prev_M[:,i]))=[batch_size, self.memory_vector_dim]
shape(x)=[batch_size,self.memory_vector_dim]
shape(tf.concat([x, tf.stop_gradient(prev_M[:,i]) * tf.expand_dims(mask_weight[:,i], axis=1)],axis=1))=[batch,2*self.memory_vector_dim],只有当前用户在第t-1时刻最感兴趣的memeory slot才参与第t时刻前向传播。任何被选中的t-1时刻的memory slot,都只参加前向传播,而不参加梯度更新。
shape(temp_output)=[batch_size, self.memory_vector_dim], 每一行表示当前样本的第i个channel的memory slot经过RNN处理后的output
shape(temp_new_state)=[batch_size, self.memory_vector_dim], 每一行表示当前样本的第i个channel的memory slot经过RNN处理后的RNN隐向量h
'''
temp_output, temp_new_state = self.channel_rnn(
tf.concat([x, tf.stop_gradient(prev_M[:,i]) * tf.expand_dims(mask_weight[:,i], axis=1)], axis=1), # 只有当前用户在第t-1时刻最感兴趣的memeory slot才参与第t时刻前向传播。任何被选中的t-1时刻的memory slot,都只参加前向传播,而不参加梯度更新。
self.channel_rnn_state[i]
)
'''
shape(self.channel_rnn_state[i])=[batch_size,self.memory_vector_dim], 每一行表示当前样本的第i个channel的memeory slot的RNN隐向量h
self.channel_rnn_state[i], 针对每一行值,存储用户第i个channel中,某个历史时刻最感兴趣的memeory slot,该memeory slot可能是在当前的第t时刻产生,也可能是之前的第t-n时刻产生
'''
self.channel_rnn_state[i] = temp_new_state * tf.expand_dims(mask_weight[:,i], axis=1) + self.channel_rnn_state[i]*(1- tf.expand_dims(mask_weight[:,i], axis=1))
'''
shape(temp_output)=[batch_size, self.memory_vector_dim], 针对每一行的值,存储第i个channel的RNN处理后的output值,即:RNN(用户第i个memory在某个历史时刻t'最感兴趣的memeory slot与该时刻t'所输入x的融合信息)
'''
temp_output = temp_output * tf.expand_dims(mask_weight[:,i], axis=1) + self.channel_rnn_output[i]*(1- tf.expand_dims(mask_weight[:,i], axis=1))
output_list.append(tf.expand_dims(temp_output,axis=1)) # shape(output_list)=list,每个单元的shape=[batch_size,1,self.memory_vector_dim],共有self.memory_size个单元
M = prev_M
sum_aggre = prev_state["sum_aggre"]
'''
Memeory Write, equation(4) in the paper
'''
for i in range(self.write_head_num):
w = tf.expand_dims(write_w_list[i], axis=2) # shape(w)=[batch_size,self.memory_size,1]
erase_vector = tf.expand_dims(tf.sigmoid(erase_add_list[i * 2]), axis=1) # shape(erase_vector)=[batch_size,1,self.memory_vector_dim]
add_vector = tf.expand_dims(tf.tanh(erase_add_list[i * 2 + 1]), axis=1) # shape(add_vector)=[batch_size,1,self.memory_vector_dim]
M = M * (tf.ones(M.get_shape()) - tf.matmul(w, erase_vector)) + tf.matmul(w, add_vector)
sum_aggre += tf.matmul(tf.stop_gradient(w), add_vector)
w_aggre = prev_state["w_aggre"]
if self.util_reg:
w_aggre += tf.add_n(write_weight)
else:
w_aggre += tf.add_n(write_w_list)
if not self.output_dim:
output_dim = x.get_shape()[1] # output_dim = memory_vector_dim = 2*EMBEDDING_DIM
else:
output_dim = self.output_dim
with tf.variable_scope("o2o", reuse=(self.step > 0) or self.reuse):
read_output = tf.contrib.layers.fully_connected(
tf.concat([controller_output] + read_vector_list, axis=1), # shape=[batch_size, self.controller_units + self.read_head_num*self.memory_size]
output_dim,
activation_fn=None,
weights_initializer=self.o2o_initializer)
read_output = tf.clip_by_value(read_output, -self.clip_value, self.clip_value) # shape(read_output)=[batch_size,memory_vector_dim]
self.step += 1
return read_output, {
"controller_state" : controller_state, # shape(controller_state)=[batch_size,self.controller_units]
"read_vector_list" : read_vector_list, # shape(read_vector_list)=list, 每个单元的shape=[batch_size,self.memory_size],共有self.read_head_num个单元
"w_list" : w_list, # shape(w_list)=list,每个单元的shape=[batch_size,self.memory_size],共有num_heads个单元
"M" : M, # shape(M)=[batch_size, self.memory_size, self.memory_vector_dim]
"key_M": key_M, # shape(key_M)=[batch_size, self.memory_size, self.memory_vector_dim]
"w_aggre": w_aggre, # shape(w_aggre)=[batch_size, self.memory_size]
"sum_aggre": sum_aggre # shape(sum_aggre)=[batch_size, self.memory_size, self.memory_vector_dim]
}, output_list
'''
def addressing(self, k, beta, key_M, prev_M):
shape(key_M)=shape(prev_M)=[batch_size, self.memory_size, self.memory_vector_dim]
key_M, shape(key_M)=[batch_size, self.memory_size, self.memory_vector_dim],需要学习的参数,梯度更新仅与addressing函数有关
prev_M, shape(key_M)=[batch_size, self.memory_size, self.memory_vector_dim],模型Nerual Turing Machine中需要学习的Memory,梯度更新与addressing函数、模型的其他模块都有关
output: w_c, shape(w_c)=[batch_size,self.memory_size],表示为读memory的权重、或写memeory的权重
'''
def addressing(self, k, beta, key_M, prev_M):
# Cosine Similarity
def cosine_similarity(key, M):
key = tf.expand_dims(key, axis=2) # shape=[batch_size,self.memory_vector_dim,1]
inner_product = tf.matmul(M, key) # shape=[batch_size,self.memory_size,1]
k_norm = tf.sqrt(tf.reduce_sum(tf.square(key), axis=1, keep_dims=True)) # shape=[batch_size,1,1]
M_norm = tf.sqrt(tf.reduce_sum(tf.square(M), axis=2, keep_dims=True)) # shape=[batch_size,self.memory_size,1]
norm_product = M_norm * k_norm # 广播乘法,对应元素相乘。 shape(norm_product)=[batch_size,self.memory_size,1]
K = tf.squeeze(inner_product / (norm_product + 1e-8)) # 删除所有维度是1的维度。 shape(K)=[batch_size,self.memory_size]
return K
K = 0.5*(cosine_similarity(k,key_M) + cosine_similarity(k,prev_M)) # shape(K)=[batch_size,self.memory_size]
K_amplified = tf.exp(tf.expand_dims(beta, axis=1) * K) # shape(tf.expand_dims(beta, axis=1))=[batch_size,1]。 shape(K_amplified)=[batch_size,self.memory_size]
w_c = K_amplified / tf.reduce_sum(K_amplified, axis=1, keep_dims=True) # shape(w_c)=[batch_size,self.memory_size]
return w_c
'''
# shape(s)=[batch_size,(self.shift_range * 2 + 1)],
# shape(w_g)=[batch_size,self.memory_size], read/write weight vector, corresponds to w^r_t in the paper
# shape(gamma)=[batch],向量
'''
def capacity_overflow(self, w_g, s, gamma):
s = tf.concat(
[s[:, :self.shift_range + 1], # shape=[batch_size, self.shift_range+1]
tf.zeros([s.get_shape()[0], self.memory_size - (self.shift_range * 2 + 1)]), # shape=[batch_size, self.memory_size-(self.shift_range * 2 + 1)]
s[:, -self.shift_range:]], # shape=[batch_size, self.shift_range]
axis=1
) # shape=[batch_size, self.memory_size]
t = tf.concat([tf.reverse(s, axis=[1]), tf.reverse(s, axis=[1])], axis=1) # shape=[batch_size, 2*self.memory_size]
s_matrix = tf.stack(
[t[:, self.memory_size - i - 1:self.memory_size * 2 - i - 1] for i in range(self.memory_size)],
axis=1
) # shape=[batch_size, self.memory_size, self.memory_size]
w_ = tf.reduce_sum(tf.expand_dims(w_g, axis=1) * s_matrix, axis=2) # shape=[batch_size, self.memory_size]
w_sharpen = tf.pow(w_, tf.expand_dims(gamma, axis=1)) # shape=[batch_size, self.memory_size]
w = w_sharpen / tf.reduce_sum(w_sharpen, axis=1, keep_dims=True) # shape=[batch_size, self.memory_size]
return w
'''
def capacity_loss(self, w_aggre):
equation(8) in the paper
'''
def capacity_loss(self, w_aggre):
loss = 0.001 * tf.reduce_mean((w_aggre - tf.reduce_mean(w_aggre, axis=-1, keep_dims=True))**2 / self.memory_size / self.batch_size)
return loss
'''
def zero_state(self, batch_size, dtype):
功能:
声明模型中需要训练的变量
'''
def zero_state(self, batch_size, dtype):
with tf.variable_scope('init', reuse=self.reuse):
read_vector_list = [expand(tf.tanh(learned_init(self.memory_vector_dim)), dim=0, N=batch_size) for i in range(self.read_head_num)] # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num个单元
w_list = [expand(tf.nn.softmax(learned_init(self.memory_size)), dim=0, N=batch_size) for i in range(self.read_head_num + self.write_head_num)] # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num + self.write_head_num个单元
controller_init_state = self.controller.zero_state(batch_size, dtype) # RNN的初始h,shape=[batch_size,self.controller_units]
M = expand(
tf.tanh(tf.get_variable('init_M', [self.memory_size, self.memory_vector_dim], initializer=tf.random_normal_initializer(mean=0.0, stddev=1e-5), trainable=False)),
dim=0,
N=batch_size
) # shape=[batch_size, self.memory_size, self.memory_vector_dim]
key_M = expand(
tf.tanh(tf.get_variable('key_M', [self.memory_size, self.memory_vector_dim], initializer=tf.random_normal_initializer(mean=0.0, stddev=0.5))),
dim=0,
N=batch_size
) # shape=[batch_size, self.memory_size, self.memory_vector_dim]
sum_aggre = tf.constant(np.zeros([batch_size, self.memory_size, self.memory_vector_dim]), dtype=tf.float32)
zero_vector = np.zeros([batch_size, self.memory_size])
zero_weight_vector = tf.constant(zero_vector, dtype=tf.float32)
state = {
"controller_state" : controller_init_state, # RNN的初始h,shape=[batch_size,self.controller_units]
"read_vector_list" : read_vector_list, # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num个单元
"w_list" : w_list, # shape=list,每个单元的shape=[batch_size,self.memory_vector_dim],共有self.read_head_num + self.write_head_num个单元
"M" : M, # shape=[batch_size, self.memory_size, self.memory_vector_dim]
"w_aggre" : zero_weight_vector, # shape=[batch_size, self.memory_size]
"key_M" : key_M, # shape=[batch_size, self.memory_size, self.memory_vector_dim]
"sum_aggre" : sum_aggre # shape=[batch_size, self.memory_size, self.memory_vector_dim]
}
return state
3. 训练模型
train_book.py
def eval(sess, test_data, model, model_path, batch_size):
参数:
sess, tensorflow session
test_data, 测试数据
model_path, 模型的保存路径
batch_size, 模型每次处理的样本数量
功能:
使用test_data测试模型,并计算模型的准确率accuracy、损失函数值loss、auc值,
将拥有最佳auc值的模型保存到路径model_path
def train(
train_file = "./data/book_data/book_train.txt",
test_file = "./data/book_data/book_test.txt",
feature_file = "./data/book_data/book_feature.pkl",
batch_size = 128,
maxlen = 100,
test_iter = 50,
save_iter = 100,
model_type = 'DNN',
Memory_Size = 4,
Mem_Induction = 0,
Util_Reg = 0
):
参数:
test_iter,每训练test_iter轮,测试一次模型,如果该时刻的模型的auc值是全局最优,则保存该模型
save_iter, 每训练save_iter轮,都保存一次模型
model_type, 所选定的模型
Memory_Size, 每个样本的原始特征所映射成的Memory channel数
Mem_Induction,0/1 表示是否考虑多memory channel特征信息
Util_Reg, 0/1 表示是否开启参数正则化功能
功能:
使用训练和测试数据,对所选定的模型进行训练和测试。
每test_iter轮,输出一次训练结果和测试结果,并保存具有最佳auc值的测试模型。
每训练save_iter轮,都保存一次模型
def test(
train_file = "./data/book_data/book_train.txt",
test_file = "./data/book_data/book_test.txt",
feature_file = "./data/book_data/book_feature.pkl",
batch_size = 128,
maxlen = 100,
test_iter = 100,
save_iter = 100,
model_type = 'DNN',
Memory_Size = 4,
Mem_Induction = 0,
Util_Reg = 0
):
功能:
从路径model_path中加载模型,并使用测试数据test_data计算模型的auc、loss、accuracy、aux_loss