这篇文章是基于元学习的文章,AAAI 2022 年收录的两篇推荐学习文章中的一篇,很具有气代表性,适合用于在线推荐任务的实现,在推荐系统领域通过牺牲可接受的计算代价,更新模型的参数,同时采用数据流式计算,并返回用户感兴趣的结果。我将再公司的项目中采用该解决方案实现在线推荐任务的实现。
文章中涉及到的主要技术有GAT, embedding,MLP。等主要技术。有关论文实现的代码,比较乱(没有工程化处理),但是思路清晰, 不喜勿喷,可以留言评论交流。
论文的实现过程与本人实现的过程又一些区别, 我在文中已经做了详细的说明,主要的区别在于baseline 模型的选择以及GAT 的替代。在整体框架完成之后,这两方面的细节再做出更改。
step 1: 构建用户的反馈历史:主要通过两个方面进行:
1 获取指定用户的历史交互item
2, 获取指定item 的历史响应用户
关键信息如下:
step 2: 扩展代表向量,实际上是通过查表lookup table 方法将用户和item 用向量的形式表达, 然后通过线性映射得到扩展后的用户和item表达, 这种表达式基于交互式的表达:公式如下:
在这一部分中
我们采用score 评价分数替代了 。在GAT 中, 这实际上是一个无向图,两个节点相互链接的权重是等价的。理论上来说,是不等价的,原因在于GAT中某个节点与其他的节点之间的权重,受到该节点所有子节点的链接权重的影响。这点从
中的求和公式就可以看出来。
通过以上方法可以得到用户以及item的扩展式向量表达。 因此表示某一个反馈表达则可以通过:
公式十实现。
相关的代码如下:
import tensorflow as tf import numpy as np import pandas as pd import keras.backend as K ''' @name: kenny adelaide @email: kenny13141314@163.com @time: 2022/3/25 @description: this is a online RS demo with svd algorithm. this main idea from Meta-Learning for Online Update of recommender Systems. ''' ''' as follow that is these steps for building entire process named pseudo code. README: #========================================================================================== # pseudo code for GAT # INPUT PARAMETERS: USER-ITEM INTERACTIONS; #========================================================================================== # via definition 1 to construct the GAT , separately user side and item side. # 1. constructed a user's item set from interactions. # 2. constructed a item's user set from interactions. # for i in data: # for j in data: # find set_user{items} # find set_item{users} # 3. generated the embedding about a user and a item. # 4. via the GAT side to extend these embedding. # 5. calculating the score weight about a user-item interaction from a graph named GAT. #========================================================================================== ''' class Loss(object): def __init__(self): pass def loss(y_true, y_pred, eps=1e-15): ''' this is a log-likehood function to calculate the network looss. :param y_true: true output. :param y_pred: prediction output. :param eps: this is a etra parameter for avoiding the output is 0 and 1 prob and then the result is error. :return: loss ''' p = K.clip(y_pred, eps, 1 - eps) _loss = K.abs(K.sum(y_true * K.log(p) - (tf.Variable(1, dtype=tf.float32) - y_true) * K.log(1 - p), axis=1)) return _loss / len(y_true) class ExtendEmbedding(tf.keras.layers.Layer): ''' For the bipartite graph, the users in Huser(x) constitute the user side, and the items in Hitem(x) constitute the item side. Here, each user (or item) node is represented by the user (or item) embedding used in the recommender model. An edge is created for each of the previous user-item interactions, and its weight is determined by the attention score between them. Then, a user (or item) embedding is extended using the connections to the other side on the bipartite graph, as specified in Definition 2. ''' def __init__(self, **kwargs): super().__init__(**kwargs) def build(self, input_shape): self.b = tf.Variable(initial_value=tf.random.truncated_normal(shape=[1, 15], dtype=tf.float32), dtype=tf.float32, name='EXTEND_b', trainable=True) self.W = tf.Variable(initial_value=tf.random.truncated_normal(shape=[input_shape[1], 15], dtype=tf.float32), dtype=tf.float32, name='EXTEND_W', trainable=True) def call(self, inputs): return tf.nn.relu(tf.matmul(inputs, self.W) + self.b) class EmbeddingLayer(tf.keras.layers.Layer): ''' this is a embedding layer for embeded one-hot vector to dense vector. ''' def __init__(self, input_shape=None, **kwargs): super().__init__(**kwargs) ''' first, we wanna implement the embedding that implemented the pool layer function before. :param titles: feature weight name. :param shape: the shape of weight. :return: None ''' self.embedding_weight_users = tf.Variable(initial_value=tf.random.truncated_normal(shape=[input_shape[0], 10], dtype=tf.float32), dtype=tf.float32, name='user_embedding', trainable=False) self.embedding_weight_items = tf.Variable(initial_value=tf.random.truncated_normal(shape=[input_shape[1], 10], dtype=tf.float32), dtype=tf.float32, name='item_embedding', trainable=False) def call(self, row_index, flag): ''' flag is a varable to control get user embedding or item embedding. :param row_index: :param flag: :return: ''' if flag == 1: result = tf.nn.embedding_lookup(self.embedding_weight_users, [row_index]) if flag == 0: result = tf.nn.embedding_lookup(self.embedding_weight_items, [row_index]) return result class GAT(tf.keras.layers.Layer): ''' this is a network to calculate the score about user-item interactions with a graph theroy. via the score to train the data for fetching the attention. it's result is for embedding-extend. ''' def __init__(self, **kwargs): super().__init__(**kwargs) def build(self, input_shape): self.au = tf.Variable(initial_value=tf.random.truncated_normal(shape=input_shape, dtype=tf.float32), dtype=tf.float32, name='GAT_au', trainable=True) def call(self, inputs): return tf.nn.relu(tf.matmul(self.W, tf.transpose(inputs)) + self.b) class InteractionPresentation(tf.keras.layers.Layer): ''' (INTERACTION REPRESENTATION) Given a user-item interaction x = (t, u, i), let ˜eu and ˜ei be the extended embeddings of u and i, respectively. The interaction representation of x, hx, is defined by ''' def __init__(self, **kwargs): super().__init__(**kwargs) def build(self, input_shape): self.b = tf.Variable(initial_value=tf.random.truncated_normal(shape=[1, 20], dtype=tf.float32), dtype=tf.float32, name='InteractionPresentation_b', trainable=True) self.W = tf.Variable(initial_value=tf.random.truncated_normal(shape=[input_shape[1], 20], dtype=tf.float32), dtype=tf.float32, name='InteractionPresentation_W', trainable=True) def call(self, inputs): return tf.nn.relu(tf.matmul(inputs, self.W) + self.b) class DataGenerator(object): ''' this is a generator for processing original data for the data formatter which we need. ''' def __init__(self, interaction_file_path): data = pd.read_csv(interaction_file_path)[['userid', 'videoid', 'score']] lines = np.array(data) userid_set = data[['userid']].userid itemid_set = data[['videoid']].videoid self.userid_max = np.max(userid_set) + 1 self.itemid_max = np.max(itemid_set) + 1 self.data = lines self.user_items = {} self.item_users = {} self.user_items_embeddings = [] self.item_users_embeddings = [] self.scores = tf.Variable(np.array(data[['score']]), dtype=tf.float32) for user_index, user_line in enumerate(userid_set): temp = [] for index, line in enumerate(lines): if int(user_line) == int(line[0]): temp.append([int(line[1]), line[2]]) self.user_items[int(user_line)] = temp for item_index, item_line in enumerate(itemid_set): temp = [] for index, line in enumerate(lines): if int(item_line) == int(line[1]): temp.append([int(line[0]), line[2]]) self.item_users[int(item_line)] = temp self.extend_embedding_input_process() def extend_embedding_input_process(self): embedding = EmbeddingLayer(input_shape=[self.userid_max, self.itemid_max]) for index, user_id in enumerate(self.user_items): user_embedding = embedding(row_index=user_id, flag=1) items = self.user_items[user_id] sum_item_embedding = tf.Variable(0.0, dtype=tf.float32) for _, item in enumerate(items): sum_item_embedding = sum_item_embedding + embedding(row_index=int(item[0]), flag=0) * item[1] user_embedding = np.concatenate([sum_item_embedding, user_embedding], axis=1) self.user_items_embeddings.append(user_embedding) self.user_items_embeddings = np.concatenate(self.user_items_embeddings, axis=0) for index, item_id in enumerate(self.item_users): item_embedding = embedding(row_index=item_id, flag=0) users = self.item_users[item_id] sum_user_embedding = tf.Variable(0.0, dtype=tf.float32) for _, user in enumerate(users): sum_user_embedding = sum_user_embedding + embedding(row_index=int(user[0]), flag=1) * user[1] item_embedding = np.concatenate([sum_user_embedding, item_embedding], axis=1) self.item_users_embeddings.append(item_embedding) self.item_users_embeddings = np.concatenate(self.item_users_embeddings, axis=0) def user_item_index_process(self, users_embedding, item_embedding): self.users = {} self.items = {} for _, user in enumerate(self.user_items): self.users[user] = tf.Variable(np.array(users_embedding[_, :]).reshape(1, len(users_embedding[_, :]))) for _, item in enumerate(self.item_users): self.items[item] = tf.Variable(np.array(item_embedding[_, :]).reshape(1, len(item_embedding[_, :]))) if __name__ == '__main__': data_obj = DataGenerator('data/00000005.csv') user_gat = ExtendEmbedding() item_gat = ExtendEmbedding() users_embeddings = user_gat(data_obj.user_items_embeddings).numpy() items_embeddings = item_gat(data_obj.item_users_embeddings).numpy() data_obj.user_item_index_process(users_embeddings, items_embeddings) interaction = [94477, 99] # 94477 is a user_id, 99 is a item . user_representative = data_obj.users[interaction[0]] item_representative = data_obj.items[interaction[1]] interaction_input = np.concatenate([user_representative, item_representative], axis=1) interaction_embedding = InteractionPresentation() hx = interaction_embedding(interaction_input)
step 3: 论文中明确了几个重要的点, 其中包括对参数进行最新的定义(在元学习中)。涉及到的参数有,loss, 梯度,当前参数值:
其目的是定义参数在模型中(用户交互过程中存在的意义与相关的角色定义),起基本立足点基于如下假设:
根据上诉内容我们得到了 hx 的最终结果:即一个交互记录。
有关上图的GAT部分我们完全实现了,得到了交互记录的embedding 表示, 上部分MLP 的输入来源于模型的参数。对于其过程我做了如下梳理:
以上的表示我们采用了Func svd 算法模型作为baseline 预测模型。原文中采用的其他baseline。则
参数就代表两个矩阵Q和T 的正交矩阵元素。因此其学习率矩阵与正交矩阵的大小一致。论文的如下部分明天接着实现。
得到了 变量之后, 通过baseline 算法svd, 我们可以得到svd 的当前参数,损失,以及每次迭代的梯度,
最终求得 稳重已经说明,11式是一个MLP 多层的感知器,其目的是通过多层的线性映射。
接下来我们梳理一下稳重所需要更新的参数:
1, 学习率:
2 用户扩展后的向量代表参数:
3 用户项目注意力向量计算参数该参数是一个向量。
最终通过baseline svd 算法的每次迭代更新相关的结果。值得注意的是, 这个数据流程采用的是流式计算,因此每一个用户交互进入模型,就相当于一次迭代。 baseline的最佳模型可以采用FUNCK svd 基本算法去实现最初的参数。 后续流式计算参数的更新属于小范围(可接受计算代价范围内的更新)。
针对当前的处理流程这遗留下来了一个问题。采用替代算法参数计算的改变究竟有多好。作者从理论上分析了这一流程的优势。其中学习率参数矩阵的rank 表明:如果rank 是非常高的,则矩阵能够为新交互支持更多灵活的参数更新。因为rank 越高,则矩阵映射的空间维度越高(等价交换后等价于低rank的概率越小, 这里采用了一个概率问题解释,因为随着算法的参数的不断更新,我们无法控制这个高rank向低rank 转变的概率),矩阵基于线性变换之后可以等价于最简单的形式,如果存在等价的线性变换,则矩阵实际上可以用更低维度的矩阵表示,实际上存在矩阵0元素表示,因此矩阵会更稀疏。(个人理解)。以前的策略中学习率为常数,rank等于1, 改变策略之后学习率为矩阵, 针对每个用户的交互都有一个学习率。作者证明了这种高rank,在最优学习率矩阵W∗的情况下,以前的策略可能存在较大的最优性差距。
作者定义间隙差距为欧式距离表示:
在表示学习率间隙的过程中,W 的rank为1 等于常量(以固定学习率为参考),
其中为梯度。 可以理解为真实值与预测值之间的差。
通过奇异值可以得到的下界限。
总结:
这是一篇不错的文章,作为线上推荐策略是一个比较好的策略,在大量数据过载的情况下, 这种策略的选择与baseline 算法的复杂度相关, 我采用了svd 算法实现,但是当用户以及item 变量趋近于无穷的情况下,对计算机的硬件资源要求就比较高,因此这种情况下不建议采用这种基于内存的baseline算法。
补充代码
data_obj = DataGenerator('data/00000005.csv') user_gat = ExtendEmbedding() item_gat = ExtendEmbedding() users_embeddings = user_gat(data_obj.user_items_embeddings).numpy() items_embeddings = item_gat(data_obj.item_users_embeddings).numpy() data_obj.user_item_index_process(users_embeddings, items_embeddings) interaction = [94477, 99, 3] # 94477 is a user_id, 99 is a item . user_representative = data_obj.users[interaction[0]] item_representative = data_obj.items[interaction[1]] interaction_input = np.concatenate([user_representative, item_representative], axis=1) interaction_embedding = InteractionPresentation() hx = interaction_embedding(interaction_input) # this is a funck svd algorithm as a baseline demo. # funck_svd() update = Updata_Svd_Parameters('./data/00000005.npy') update.update_baseling_parameters(hx, interaction[0], interaction[1], interaction[2])
''' =================================================funck svd============================================================== ''' def init_P_Q_matrix(user_disms=[3, 3], item_disms=[3, 3], init_method='quadrature'): ''' this is a function to create two matrix for sgd training. we via quadrature distribution function. Args: user_disms: user matrix shape. item_disms: item matrix shape init_method: generating matrix approach. Returns: ''' if str(init_method) == str('quadrature'): P = random.randn(user_disms[0], user_disms[1]) Q = random.randn(item_disms[1], item_disms[0]) return [P, Q] return def calculate_error(P_matrix, Q_matrix, y_matrix): ''' calculating error rator from two matrix. Returns: ''' rows, cols = np.nonzero(y_matrix != None) errors = y_matrix[rows, cols] - np.sum(P_matrix[rows] * Q_matrix.T[cols], axis=1) return errors def gradient(P_matrix, Q_matrix, rows, cols, a, index, error): or_row, or_col = rows[index], cols[index] P_gradient = -2 * error * Q_matrix[:, or_col] + 2 * a * P_matrix[or_row, :] Q_gradient = -2 * error * P_matrix[or_row, :] + 2 * a * Q_matrix[:, or_col] return [Q_gradient, P_gradient] def updateParameters(Q_gradient, P_gradient, P, Q, learning_rate, index, rows, cols): or_row, or_col = rows[index], cols[index] P[or_row, :] -= learning_rate * P_gradient Q[:, or_col] -= learning_rate * Q_gradient return [P, Q] def funck_svd(): ''' train function is ford training svd++ algorithm. defined two matrix to fit the orginal rating-matrix. Returns: cost and iters count. ''' dic = dict() [data, userno, videono] = onloaddata() learning_rate = 0.001 iters = 500 a = 0.005 [P, Q] = init_P_Q_matrix(user_disms=[userno, 10], item_disms=[videono, 10], init_method='quadrature') y_matirx = build_score_matrix_R(data, userno, videono) if not isinstance(P, np.ndarray): P = np.array(P).around(decimals=4) if not isinstance(Q, np.ndarray): Q = np.array(Q).around(decimals=4) if not isinstance(y_matirx, np.ndarray): y_matirx = np.array(y_matirx).around(decimals=4) rows, cols = np.nonzero(y_matirx != None) cost_arr = [] count = 0 bar = progressbar for i in range(iters): errors_matrix = calculate_error(P, Q, y_matirx) cost = np.sum(np.square(errors_matrix)) if cost <= 0.00001: break for index in range(len(rows)): [Q_gradient, P_gradient] = gradient(P, Q, rows, cols, a, index, errors_matrix[index]) [P, Q] = updateParameters(Q_gradient, P_gradient, P, Q, learning_rate, index, rows, cols) cost_arr.append(cost) count += 1 print('{}:{}'.format(i, cost)) dic['svd_lr'] = learning_rate dic['svd_iters'] = iters dic['svd_a'] = a dic['svd_p'] = P dic['svd_q'] = Q np.save('./data/00000005.npy', dic) return cost_arr, count class BASELINE_MLP(tf.keras.Model): def __init__(self, **kwargs): super().__init__(**kwargs) self.hidden1 = tf.keras.layers.Dense(3, activation='relu', name='hidden1') self.hidden2 = tf.keras.layers.Dense(8, activation='relu', name='hidden2') self.hidden3 = tf.keras.layers.Dense(15, activation='relu', name='hidden3') self.hidden4 = tf.keras.layers.Dense(10, activation='relu', name='hidden3') def build(self, input_shape): pass def call(self, inputs, training=None, mask=None): output1 = self.hidden1(inputs) output2 = self.hidden2(output1) output3 = self.hidden3(output2) output4 = self.hidden4(output3) return output4 class Updata_Svd_Parameters(): ''' first, we need to get some parameters include gradient, current value, and loss, value for that constructing h(t,m), and we need pre-defined a learning-rate matrix as the weight to line-mapping [hx,h(t,m)]. then update the Q and T matrix. :return: ''' def __init__(self, baseline_path_parameters): baseline_model = np.load(baseline_path_parameters, allow_pickle=True).item() self.Q = baseline_model['svd_q'] self.P = baseline_model['svd_p'] self.score_matrix = np.matmul(self.P, self.Q) # defined a learning rate matrix for update current value from baseline model weight. self.W_lr = self.W = tf.Variable(initial_value=tf.random.truncated_normal(shape=self.score_matrix.shape, dtype=tf.float32), dtype=tf.float32, name='W_lr', trainable=True) self.b_lr = self.W = tf.Variable(initial_value=tf.random.truncated_normal(shape=self.score_matrix.shape, dtype=tf.float32), dtype=tf.float32, name='b_lr', trainable=True) def update_baseling_parameters(self, hx, userid, itemid, score): ''' interaction :(userid, itemid, score) :param hx: :param userid: :param itemid: :param score: :return: ''' current_value = self.score_matrix[userid, itemid] error = score - current_value _gradient = self.gradient(error, current_value) _loss = self.loss(error) input = np.array([current_value, _loss, _gradient]).reshape(1, 3) baseline_model_BASELINE_MLP = BASELINE_MLP() ht_m = baseline_model_BASELINE_MLP(tf.Variable(input, dtype=tf.float32)) new_learning_rate = tf.nn.softmax(self.W * np.concatenate([hx, ht_m], axis=1) + self.b_lr) def loss(self, error): return np.square(error) def gradient(self, error, value): return -2 * error * value
可能有重复,没有仔细检查,这里做个提示!