推荐领域论文阅读

最新推荐文章于 2023-11-27 17:58:19 发布

czlm爱你的笑

最新推荐文章于 2023-11-27 17:58:19 发布

阅读量556

点赞数

分类专栏： paper阅读

本文链接：https://blog.csdn.net/lianwaiyuwusheng/article/details/109537547

版权

paper阅读专栏收录该内容

5 篇文章 0 订阅

订阅专栏

推荐

2001-Item-based collaborative filtering recommendation algorithms

引用数：9183 来源：Proceedings of the 10th international conference on World Wide Web

问题：访问人数越来越多。传统基于user-CF方法的工作量随着用户的增多而增大，通过查找相似用户来进行推荐必然会碰到瓶颈。
CF模型将整个用户-物品 $m\times n$ 评分作为输入矩阵，可分为基于memory-based(user)和model-based(item)的两种。前者基于矩阵找相似的用户，后者对用户的评分建模。
本文研究item间的关系，因为item间的关系相对static，可以推荐与用户喜好物品相似的物品来提升速度。
方法：主要分为两步，1.计算物品相似度，2.评分计算。
1. 挑出那些在i,j两item上都进行了评分的人，然后计算i，j相似性：
  - cosine-based similarity：将i，j视为由用户评分组成的向量，相似性为向量角度的cos值； $sim(i,j)=\cos(\vec i,\vec j)=\frac{\vec i\cdot \vec j}{||\vec i||_2*||\vec j||_2}$
  - correlation-based similarity：Pearson-r correlation，设对i，j都评价的人的集合为U，则 $sim(i,j)=\frac{\sum_{u\in U}(R_{u,i}-\bar R_i)(R_{u,j}-\bar R_j)}{\sqrt{\sum_{u\in U}(R_{u,i}-\bar R_i)^2}\sqrt{\sum_{u\in U}(R_{u,j}-\bar R_j)^2}}$ ， $\bar R_i$ 为物品i的平均评分。
  - adjusted-cosine similarity：考虑不同用户打分的范围， $sim(i,j)=\frac{\sum_{u\in U}(R_{u,i}-\bar R_u)(R_{u,j}-\bar R_u)}{\sqrt{\sum_{u\in U}(R_{u,i}-\bar R_u)^2}\sqrt{\sum_{u\in U}(R_{u,j}-\bar R_u)^2}}$ 。
2. 计算物品的评分：
  - Weighted Sum：通过用户在与i相似的物品上的评分来推测对i的评分， $P_{u,i}=\frac{\sum_{all\ similar\ items,N}(s_{i,N}*R_{u,N})}{\sum_{all\ similar\ items,N}(|s_{i,N}|)}$
  - Regression：相似物品的评分 $R_N$ ,i物品的评分 $R_i$ ；则 $\bar R'_N=\alpha \bar R_i+\beta+\epsilon$ 。
数据集：Movie data from movielens；选择评分电影超20部的用户，交互大概有1百万个，然后将该数据集分为训练集和测试集。
评价指标：本文使用MAE， $MAE=\frac{\sum_{i=1}^N|p_i-q_i|}{N}$ 。
测试不同参数(领域大小、train/test比例，不同相似性测量方法)对算法的影响，选择最优解，然后固定参数。该过程只在训练集上进行10折交叉验证。
实验结果：
- 相似性算法：使用加权和产生预测，adjusted cosine similarity最好。
- Training/Test比例x：使用两种预测生成方法，x增加，MAE越小，，取两条预测生成方法曲线的交点，x=0.8.
- 领域大小：加权和随领域增大，MAE越小，并趋于平缓；回归方法随着领域增大，MAE越大。选择领域大小为30.
论文里的图

2009-Matrix factorization techniques for recommender systems

引用数：7437 来源：Computer

在Netflix竞赛上证明矩阵分解技术好于最近邻方法。并且可以结合其他信息（隐反馈、时间等）。
主要的方法：
- the content filtering approach\content-based，创造刻画用户或物品的资料，然后使用资料关联用户和匹配的物品。
- past user behavior\collaborative filtering，分析用户间的关系，物品间的相关性，然后据此确定用户物品关系。又可以分为领域的方法和建立隐因素模型的方法。
推荐系统的输入，有明显反馈输入的系统，一般是一个矩阵，坐标轴一般为用户、用户在物品上的评分，是一个稀疏矩阵。
方法：
- 基本的matrix factorization模型
  - 将用户和物品映射到一个 $f$ 维度的联合因素空间，用户-物品的交互可用空间中的内积表示。物品可表示为 $q_i\in R^f$ ，用户表示为 $p_u\in R^f$ ；空间中的每一维表示用户(物品)在该维上的喜好(程度)。 $q_i^Tp_u$ 表示用户u与物品i的交互(兴趣)，近似用户的评分 $\hat{r}_{ui}$ 。当学的映射后，可用== $\hat{r}_{ui}=q_i^Tp_u$ ==来预测用户u在物品i上的评分。
  - 这种方法与奇异值分解singular matrix decomposition紧密相关。由于稀疏性，无法直接将SVD应用到CF。一种是填充数据，另一种是只用已有数据，通过正则化防止过拟合。即：
    $\min_{q^*,p^*}\sum_{(u,i)\in K}(\hat{r}_{ui}-q_i^Tp_u)^2+\lambda(||q_i||^2+||p_u||^2) \tag{1}$
- 学习方法：
  - stochastic gradient descent：随机梯度下降
  - alternating least squares：交替固定一个参数，依次解一个最小二乘问题，可并行处理。
- 添加baises
  - 由于用户等原因，会产生一些变化，如一些用户偏向于给一些物品高分。
  - 一阶偏差可定义为： $b_{ui}=\mu+b_u+b_i$ 。分别代表：整体平均值，用户bias，物品bias；则评分可表示为：
    $\hat{r}_{ui}=\mu+b_u+b_i+q_i^Tp_u \tag{2}$
- 添加额外输入，解决cold start problem，提供的评分较少的情况。
  - 设 $N (u)$ 表示用户有隐偏好的物品集合(如历史购物、浏览情况)。用户在该集合上的偏好可表示为 $\sum_{i\in N(u)}x_i$ ， $x_i\in R^f$ 。
  - 设 $A (u)$ 表示用户对应的属性集合(如性别、年龄)，用户在相关属性集合上可表示为 $\sum_{a\in A(u)}y_a$ ， $y_a\in R^f$ 。
  - 所以评分表示为：可再添加额外的物品属性。
    $\hat{r}_{ui}=\mu+b_u+b_i+q_i^T[p_u+|N(u)|^{-0.5}\sum_{i\in N(u)}x_i+\sum_{a\in A(u)}y_a] \tag{3}$
- temporal dynamics：偏好会随时间变化
  - 用户偏差，物品偏差，用户偏好都可能随时间变化，所以，评分可表示为：
    $\hat{r}_{ui}(t)=\mu+b_u(t)+b_i(t)+q_i^Tp_u(t) \tag{4}$
- 加上confidence level：打分的可信度是不一样的，如一个视频看的次数。
  - 添加可信度，问题变为：
    $\min_{q^*,p^*}\sum_{(u,i)\in K}c_{ui}(\hat{r}_{ui}-\mu -b_u-b_i-q_i^Tp_u)^2+\lambda(||q_i||^2+||p_u||^2+b_u^2+b_i^2) \tag{5}$
    $c_{ui}$ 表示用户u在物品i上的置信度。
数据集：Netflix数据集（大约），训练集100百万个评分，0.5百万用户，1.7万的电影；测试集，预测3百万个评分。
评价指标：计算root-mean-square error (RMSE)， $RMSE=\sqrt\frac{\sum_{i=1}^N(p_i-q_i)^2}{N}$ 。
实验结果：由（1）-（5）参数越多，性能越好。

2010-Factorization machines

引用数：1217 来源：2010 IEEE International Conference on Data Mining

问题：很多分解方法(如MF、parallel factor analysis、SVD++、PITF、FPMC)，缺点是无法广泛应用到普遍的预测问题。只能针对特定的输入(任务)。
FM结合了SVM和分解模型的优点，是具有一般性的预测器，计算复杂度是线性的，能对变量间的各种交互建模（包括高稀疏的数据）。
方法：
- 将输入编码为向量，输入 $x\in R^n$ ，输出 $y\in R$ ，
- d=2的模型方程：
  - 传统的SVM的方法，如 $\hat y(x):=w_0+\sum_n w_ix_i+\sum_{i=1}^n \sum_{j=i+1}^nw_{i,j}x_ix_j$ ；变量w间的交互视为独立的，使用maximum margin,会导致在测试中的训练集中未出现的交互 $w_{i,j}=0$ ；
  - 本文的方法：
    $\hat y(x):=w_0+\sum_n w_ix_i+\sum_{i=1}^n \sum_{j=i+1}^n<\mathtt v_i,\mathtt v_j>x_ix_j \tag{eq1}$
    $w_{i,j}:=<\mathtt v_i,\mathtt v_j>$ 表示第i,j两个变量的交互， $\mathtt v_i\in R^k$ 。
- eq1的计算量为 $O(kn^2)$ ；由于经过变换， $\sum_{i=1}^n \sum_{j=i+1}^n<\mathtt v_i,\mathtt v_j>x_ix_j=\frac{1}{2}\sum_{f=1}^k((\sum_{i=1}^nv_{i,f}x_i)^2-\sum_{i=1}^2v_{i,f}^2x_i^2)$ ；则计算量为 $O (k n)$ 。只算非零元素，则计算量为 $O(k\bar m_D)$ ； $\bar m_D$ 为输入向量中的非零元素的平均个数。
- FM对可实现其他方法：
  - MF：只取user和item； $\hat y(\mathtt x)=w_0+w_i+w_u+<\mathtt v_u,\mathtt v_i>$ 。
  - SVD++：考虑输入的前三项， $\hat y(\mathtt x)=w_0+w_i+w_u+<\mathtt v_u,\mathtt v_i>+\frac{1}{\sqrt{|N_u|}}\sum_{l\in N_u}<\mathtt v_i,\mathtt v_l>$ 。
  - PITF：推荐标签， $\hat y(\mathtt x)=w_0+w_u+w_i+w_t+<\mathtt v_u,\mathtt v_i>+<\mathtt v_u,\mathtt v_t>+<\mathtt v_i,\mathtt v_t>$ ；使用pairwise ranking则变为： $\hat y(\mathtt x)=w_t+<\mathtt v_u,\mathtt v_t>+<\mathtt v_i,\mathtt v_t>$ 。
- Factorized Personalized Markov Chains (FPMC)：基于用户上次购买情况对商品排序； $\hat y(x)=w_i+<\mathtt v_u,\mathtt v_i>+\frac{1}{|B_{t-1}^u|}\sum_{l\in B_{t-1}^u}<\mathtt v_i,\mathtt v_l>$ 。 $B_{t-1}^u$ 为用户t-1时买的物品。
数据集：ECML、Netflix。

2012-BPR: Bayesian personalized ranking from implicit feedback

引用数：2728 来源：Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence（2009）

问题：有很多以隐反馈implicit feedback(如购买浏览历史)输入，实现个性化推荐(MF,kNN)，但它们都没有直接针对排名进行优化。为优化排名，提出了一般化的优化标准BPR-OPT，和一般化的学习算法LearnBPR。
方法：
- 在隐反馈系统中，只有正例 $S$ （用户喜欢的）被观测到，其他数据是真负例和缺失值的样例组成；
  - 以前的方法是将观测的数据视为正例(1)，其他视为负例(0)，以此作为数据集；然后对该数据建模。所以模型学到的是： $p(i)=\begin{cases} 1 & i \in S\\ 0 & i\notin S \end{cases}$ 。未来需要预测的item在训练中被视为负例。之前模型能进行预测的原因是有正则项，防止了过拟合。
  - 本文使用物品对item pairs作为训练数据，假设用户对浏览过的物品的喜好程度超过未浏览的物品。则训练集 $D_S:=\{(u,i,j)|i\in I_u^+ \land j\in I\setminus I_u^+\}$ ； $(u, i, j)$ 表示用户u相比j更喜欢i。
- BPR Optimization Criterion：参考
  - 为所有物品进行正确的个性化排序的Bayesian公式是最大化模型参数的后验概率： $p(\Theta |>_u)$ ，正比于 $\propto p(>_u|\Theta)p(\Theta)$ 。 $\Theta$ 为模型参数， $_u$ 为用户想要的item排序。因此相当于最大化似然函数 $p(>_u|\Theta)$ 。
  - 两点假设：
    - 用户之间相互独立。
    - 一个用户对物品对(i，j)的顺序与其他物品对的顺序相互独立。
  - 对于所有用户,似然函数： $\prod_{u\in U}p(>_u|\Theta)=\prod_{(u,i,j)\in U\times I\times I}p(i>_uj|\Theta)^{(u,i,j)\in D_S}\cdot (1-p(i>_uj|\Theta))^{(u,j,i)\notin D_S}$ ；可简化为： $\prod_{u\in U}p(>_u|\Theta)=\prod_{(u,i,j)\in D_S}p(i>_uj|\Theta)$ 。
  - 定义用户相比j更喜好i的单个概率为： $p(i>_uj|\Theta):=\sigma(\hat x_{uij}(\Theta))$ 。 $\sigma$ 为sigmoid函数。 $\hat x_{uij}$ 是模型参数 $\Theta$ 学到的 $(u, i, j)$ 间的关系。
  - 对于先验概率 $p(\Theta)$ ，可以使用正态分布 $p(\Theta)\sim N(0,\Sigma_{\Theta})$ 。 $\Sigma_{\Theta}$ 为协方差矩阵。为了减少参数，设 $\Sigma_{\Theta}=\lambda_{\Theta}I$ 。 $\lambda_{\Theta}$ 为模型正则项。
  - 则一般性的优化标准为：
    $KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ BPR-OPT &:=\ln…$
- BPR Learning Algorithm：
  - 梯度： $\frac{\partial BPR-OPT}{\partial \Theta} \propto \sum_{(u,i,j)\in D_S}\frac{-e^{-\hat x_{uij}}}{1+e^{-\hat x_{uij}}}\cdot\frac{\partial}{\partial\Theta}\hat x_{uij}-\lambda_{\Theta}\Theta$ ；
  - 整体梯度下降是在所有训练集上求梯度，再更新参数，收敛缓慢。
  - 随机梯度下降在一个训练样本上进行参数更新，但容易受训练样本的顺序影响。
  - 采用随机抽取（均匀分布）的方法bootstrap sampling，防止在连续更新中一直抽到相同的用户物品整合。
- Learning models with BPR：
  - 以前方法是对(u,i)对给出一个值，因此我们将评估值 $\hat x_{uij}$ 分解定义为 $\hat x_{uij}:=\hat x_{ui}-\hat x_{uj}$ 。
  - Matrix Factorization：将矩阵 $X\in R^{U\times I}$ 分解为 $W\in R^{U\times f}$ 和 $H\in R^{I\times f}$ ；预测值表示为 $\hat x_{ui}=<w_u,h_i>=\sum_{f=1}^k w_{uf}\cdot h_{if}$ 。
数据集：Rossmann dataset：来自在线商店的购买记录；Netflix：电影DVD租赁记录，包含评分（实验中将评分都视为1）。
评价指标： $AUC=\frac{1}{|U|}\sum_u\frac{1}{E(u)}\sum_{(i,j)\in E(u)}\delta(\hat x_{ui}>\hat x_{uj})$ ， $E (u)$ 为用户u的样本对(i,j)，测试集里的物品i，未交互过的物品j。
实验方法：通过留一法，随机从每一个用户中抽出该用户交互过的一个物品，构成测试集，其他为训练集；进行测试；重复10轮，第一轮确定参数后，后面不再改变参数。

在这里插入图片描述

2018-Adversarial personalized ranking for recommendation

引用数：115 来源：The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval 代码

问题：用BPR优化的MF推荐模型对模型参数的对抗干扰并不够鲁棒，
为了提升鲁棒性和泛化能力，本文提出了新的优化框架APR（使用对抗训练提升BPR）。在用户物品的嵌入向量上添加干扰实现MF上的APR。
证明MF-BPR对对抗噪声是脆弱的：对抗图像生成的方法不能直接应用到输入 $(u, i, j)$ ，因为会改变输入语义信息，导致输出巨大变化；本文选择作用于模型的参数（MF的嵌入向量），假设模型参数的微小变化不会导致输出的剧烈变化，如果某种干扰比随机干扰更有效，则表明模型对这种干扰敏感。
- 定义对抗干扰为使BPR目标函数最大的干扰： $\Delta_{adv}=\arg{\max \atop \Delta,||\Delta||<\epsilon}L_{BPR}(D|\hat\Theta+\Delta)$ 。 $\hat\Theta$ 为确定值。
APR模型：设计一个新的目标函数，使其能胜任个性化排序和抵抗对抗干扰，最小化目标函数：
$L_{APR}(D|\Theta)=L_{BPR}(D||\Theta)+\lambda L_{BPR}(D|\Theta+\Delta_{adv}) \tag{1}$

通用的训练方法SGD：
1. 构造对抗干扰：随机抽取训练样本 $(u, i, j)$ ，最大化： $l_{adv}((u,i,j)|\Delta)=-\lambda\ln\sigma(\hat y_{uij}(\hat\Theta+\Delta))$ 。使用户u对物品ij难以区分。通过线性函数近似目标函数，使用fast gradient，即朝着梯度方向移动，
  $T:=\frac{\partial l_{adv}((u,i,j)|\Delta)}{\partial \Delta}=-\lambda(1-\sigma(\hat y_{uij}(\hat\Theta+\Delta)))\frac{\partial\hat y_{uij}(\hat\Theta+\Delta)}{\partial \Delta} \tag{2}$
  则在max-norm限制下， $\Delta_{adv}=\epsilon\frac{T}{||T||}$ 。
2. 模型参数学习：由(1)有：
  $l_{APR}((u,i,j)|\Theta)=-\ln\sigma(\hat y_{uij}(\Theta))+\lambda_{\Theta}||\Theta||^2-\lambda\ln\sigma(\hat y_{uij}(\Theta+\Delta_{adv})) \tag{3}$
  用SGD方法更新参数： $\Theta=\Theta-\eta\frac{\partial l_{APR}}{\partial\Theta}$ 。
3. 依次执行1，2；注意：模型参数 $\Theta$ 由BPR进行初始化。因为只有当模型过拟合后加干扰才有意义。
MF上的APR方法：MF的参数是用户物品的嵌入向量，定义干扰为： $\hat y_{ui}(\Theta+\Delta)=(p_u+\Delta_u)^T(q_i+\Delta_i)$ 。进行小批量训练，
数据集：Yelp、Pinterest、Gowalla；
对比方法：ItemPop（基于物品受喜好程度），MF-BPR，CDAE（基于Denoising Auto-Encoder），NeuMF（结合MF和MLP）,IRGAN（生成器和判别器进行对抗训练）
评价指标：HR（点击率，基于召回率）、NDCG（位置敏感）
实验结果：

在这里插入图片描述

2018-Self-attentive sequential recommendation

引用数：138 来源：2018 IEEE International Conference on Data Mining (ICDM)

Sequential dynamics的目的是基于用户历史行为，获得用户行为的上下文信息；主要有Markov Chains (MCs)和RNN；前者利用最近的行为，适合比较稀疏的数据；后者可利用长远的语义，适合比较密集的数据。
本文目标是平衡两者，提出基于自注意力的模型，可获得长的语义，并用注意力机制，基于较少的行为得到预测。
Temporal Recommendation对用户行为的时间戳建模；Sequential recommendation对用户行为的顺序建模。
序列推荐模型：FPMC、Caser、GRU4Rec
方法：输入用户行为序列 $S^u=(S_1^u\dots S_{|S^u|-1}^u)$ ，输出一个平移的序列 $(S_2^u\dots S_{|S^u|}^u)$ 。
- 嵌入层：将训练序列 $S^u$ 变为固定长度的序列 $s=(s_1,\dots,s_n)$ ，大于n考虑最近的n个，小于n前面填0。所有物品的嵌入矩阵 $M\in R^{|I|\times d}$ ，输入序列变为 $E\in R^{n\times d}$ 。添加Positional Embedding： $P\in R^{n\times d}$ 。输入为 $\hat E=E+P$ 。
- Self-Attention：
  $S=SA(\hat E)=Attention(\hat EW^Q,\hat EW^K,\hat EW^V)=softmax(\frac{\hat EW^Q(\hat EW^K)^T}{\sqrt d}\hat EW^V) \tag{1}$
  因为预测t+1时只能考虑前t个的item，而Self-Attention会考虑之后的item，所以需要禁止之后的连接。
- Point-Wise Feed-Forward Network：考虑不同隐藏层维度间的关系和赋予模型非线性，为每个 $S_i$ 添加两层feed-forward网络（参数共享）：
  $F_i=FFN(S_i)=ReLU(S_iW^{(1)}+b^{(1)})W^{(2)}+b^{(2)} \tag{2}$
- Stacking Self-Attention Blocks：由self-attention layer and a feedforward network构成一个基本块，堆叠块以便获得更复杂的item transition。层数越多，会出现过拟合、训练不稳定、更多训练时间等问题，对以上两个网络层进行如下操作：Residual Connections，Layer Normalization，Dropout；g代表网络。在embeding层也使用dropout。
  $\tag{3}$
  其中 $LayerNorm(x)=\alpha\odot\frac{x-\mu}{\sqrt{\sigma^2+\epsilon}}+\beta$ ， $\mu,\sigma$ 是x的均值和方差， $\alpha,\beta$ 是学到的参数。
- Prediction Layer：使用最后一层的 $F_t^{(b)}$ 来预测下一个item。使用MF layer来预测物品的相关性， $N\in R^{|I|\times d}$ (实际选100个负例和一个正确的item)
  $r_{i,t}=F_t^{(b)}N_i^T \tag{4}$
  对所有物品进行相关性计算，然后排序得到预测列表。为了减少参数和提升性能，使用Shared Item Embedding，即N=M。使用Explicit User Modeling并没有提升效果。
- Network Training：输入是固定长度的序列s，输出也是固定长度的序列， $o_t=\begin{cases}<pad> & {s_t}\text{为pad}\\s_{t+1}& 1<t<n\\S_{|S^u|}^u & t=n\end{cases}$ 。使用binary cross entropy loss作为目标函数：
  $-\sum_{S^u\in S}\sum_{t\in[1,\dots,n]}[\log(\sigma(r_{o_t,t}))+\sum_{j\notin S^u}\log(1-\sigma(r_{j,t}))] \tag{5}$
  使用adam进行优化，每一epoch，为每一个序列在每一步随机选取一个负例j。
数据集：Amazon、Steam、MovieLens-1M，将评论或评分视为隐反馈。去掉少于5的用户和物品，最后一个测试集，倒数第二个为验证集，剩下的为训练集。
评价指标：Hit Rate@10 and NDCG@10,

在这里插入图片描述

2019-On the difficulty of evaluating baselines: A study on recommender systems

引用数：13 来源：arXiv preprint arXiv

问题：正确地运行baselines是困难的，过去在Movielens 10M上对比的baseline是次优的，对其进行调整，可以获得更好的效果。所以研究只能和最优的baseline进行比较才有效。
在Movielens 10M上，通过小心地调整，baseline获得了提升，甚至超过了之前的方法。
Movielens 10M进行9:1的划分。使用RMSE进行评价。
- 对于Biased MF, RSVD, ALS-WR, BPMF等baseline：
  - Biased MF, RSVD本质上是同一种方法，差别是配置（超参数、训练集顺序、实现）
  - ALS-WR与上Biased MF, RSVD是不同算法学到的相同模型。
  - BPMF和RSVD，ALS-WR共享模型，但通过一个Gibbs sampler进行学习。
- 重新运行baseline：
  - 使用SGD学到的MF（类似Biased MF, RSVD），使用Gibbs采样训练Bayesian-MF（类似BPMF），获得了更好的效果。
  - 使用隐反馈和时间影响。如给Bayesian-MF加时间等获得更好的baseline。
在Netflix Prize同样证明了正确运行baseline是困难的。数据集包括训练集、验证集、测试集，通过用户和时间进行划分，如最近的6个作为（前3个）验证集和（后3个）测试集。
- 标准的MF通过不同的算法获得不同的效果
- 更好的方法是组合多个模型。
- 在Netflix Prize上表现好的模型在Movielens 10M也表现好，反之则不一样。
目前可靠实验指标的不足：statistical signi cance, reproducibility or hyperparameter search；因为他们没有说明方法最好时的配置。
提高实验质量：需要社区的努力，标准的baseline和提升baseline的激励。

在这里插入图片描述

附录：
- 数据集：Movielens 10M使用10折交叉验证（90:10），使用libFM库
- 考虑的信息：u用户，i电影，t时间，iu隐含用户信息（用户看过的视频集），ii隐含视频信息（看过该视频的所有用户）。
- Bayesian Learning考虑的超参数：sampling steps的数目和embedding的维数。初始化高斯分布（标准差0.1）。
- Stochastic Gradient Descent考虑：学习率0.003和正则化参数0.04，训练集取5%作为验证集确定参数，64维。

2019-Kgat: Knowledge graph attention network for recommendation 代码

引用数：93 来源：Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

问题：assumes each interaction as an independent instance with side information encoded.overlook of the relations among instances or items。do not fully explore the high-order connectivity
方法：take the graph of item side information, aka. knowledge graph。recursive embedding propagation,attention-based aggregation
实现：organize the side information in the form of knowledge graph $G_2$ ，presented as $\{(h,r,t)|h,t\in E,r\in R\}$ 。如（Hugh Jackman, ActorOf, Logan）；user-item graph can be seamlessly integrated with KG as a unified graph $G$ 。任务：given G，output the embedding of user and item，and the probability。
- Embedding Layer：employ TransR learns embeds each entity and relation by optimizing the translation principl $e_h^r+e_r\approx e_t^r$ 。
- Attentive Embedding Propagation Layers：Information Propagation，Knowledge-aware Attention，Information Aggregation。
- High-order Propagation：concatenate the representations at each step into a single vector
- Model Prediction：conduct inner product of user and item representations。
实验：BPR Loss，数据集：Amazon-book、Last-FM、Yelp2018。

在这里插入图片描述

2019-Neural graph collaborative filtering 代码

引用数：146 来源：Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval.

two key components in learnable CF models：embedding；interaction modeling。Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems

问题：the collaborative signal, which is latent in user-item interactions，not encoded in the embedding process。
方法：integrate the user-item interactions —more specifically the bipartite graph structure—into the embedding process。exploits the user-item
graph structure by propagating embeddings on it。
实现：
- Embedding Layer：building a parameter matrix as an embedding look-up table。
- Embedding Propagation Layers：message-passing architecture of GNNs；perform embedding propagation between the connected users and items。Message Construction，Message Aggregation，High-order Propagation。Propagation Rule in Matrix Form。
- Model Prediction：concatenating the item representations learned by different layers。inner product。
- Optimization：optimize the pairwise BPR loss。Message and Node Dropout，
讨论：NGCF generalizes SVD++，
数据集：Gowalla、Yelp2018、Amazon-book；

在这里插入图片描述

2020-LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation 代码pytorch、tensorflow

引用数：22 来源：(SIGIR ’20)

问题：the reasons of GCN’s effectiveness for recommendation；lacks thorough ablation analyses on GCN。
发现：two most common designs in GCNs：feature transformation and nonlinear activation — contribute little to the performance of collaborative filtering. Even worse。原因：GCN is originally proposed for node classification on attributed graph，in recommendation node is only described by a one-hot ID。
思路：simplify the design of GCN，including only the most essential component in GCN — neighborhood aggregation；

在这里插入图片描述

2020-Disentangled Graph Collaborative Filtering

引用数：2 来源：Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

问题：neglecting the diversity of user intents on adopting the items。
方法：disentangle these factors and yield disentangled representations。modeling a distribution over intents for each user-item interaction。
具体方法：
1. 将ID embedding分为K个chunks，每chunk表示一个intent；每个chunk的embedding分别执行随机初始化。
2. 为每个chunk定义一个score matrices，矩阵里的值表示用户和物品的交互。则对于一个交互(u,i)，构成K维的分数向量，都初始化为1；每个chunk的分数矩阵可看作一个图的邻接矩阵。
3. 所以对于每个intent，包含一个chunk embedding（用户和物品）的集合和图结构（分数矩阵）。在每个intent里，区分每个交互的作用，使用the neighbor routing and embedding propagation mechanisms，定义了graph disentangling layer，收集节点邻域信息（与图的intent信息相关），
4. 对于每个交互(u,i)的分数向量，通过softmax 标准化，表示在解释这个交互时，哪种intent值得注意。对于每个chunk得到新的分数矩阵（图），

在这里插入图片描述

2020-JIT2R: A Joint Framework for Item Tagging and Tag-based Recommendation

引用数：1 来源：Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Predicting tags for a given item and leveraging tags to assist item recommendation

Previous studies mostly focus only one of them to make contributions.
these tasks are inherently correlated with each other
方法：tagging function G： $\hat r_i=h(G(c_i))$ ，predictive function F： $\hat y_{ui}=F(u,i,t_i)$
- input the item feature into a framework G, and output the probability of each tag through a SOFTMAX layer。
- assign $t_i$ as the predicted value from G for untagged items。user-item interaction signals are back propagated through
  $\hat r_i$ to supervise model G,
- The Bootstrapping Technique：iteratively label the likely item-tag pairs as training data；pre-defined confidence-level f in $\hat r_i$ 。（initial training epochs should be excluded in s’s updating process；previously labeled items are allowed to be re-labeled in the later optimization process）
实验：CiteULike dataset；7288 authors’ 160272 citations on 8212 papers

在这里插入图片描述

2020-" Click" Is Not Equal to" Like": Counterfactual Recommendation for Mitigating Clickbait Issue

引用数：0 来源：

there is a significant gap between clicks and user satisfaction

a causal graph that reflects the cause effect factors in recommendation；counterfactual world where each item has only exposure features(the features that the user can see before making a click decision)。
推荐定义： $Y_{u,i}=s_{\theta}(u,i)$ ， $\bar D=\{(u,i,\bar y_{u,i})\}$ ， $\bar \theta=\arg\min L(\bar D|\theta)=\arg\min\sum_{\bar D}l(s_{\theta}(u,i),\bar y_{u,i})$ 。
思路：distinguish the effects of exposure features (pre-click) and content features (post-click) on the prediction；estimate the direct effect of exposure features on the prediction score in a counterfactual world，During inference, we remove this direct effect from the prediction in the factual world；causal effect of X on Y is the magnitude by which Y is changed by a unit change in X.
$\text{factual world}: Y_{u,i,e}=Y(U=u,I=i,E=e),i=I(E=e,T=t) \\ \text{total effect}: TE=Y_{i,e}(u)-Y_{i*,e*}(u) \\ \text{natural direct effect}:NDE=Y_{i*,e}(u)-Y_{i*,e*}(u) \\ \text{total indirect effect}:TIE=TE-NDE=Y_{i,e}(u)-Y_{i*,e}(u)$
Ranking items according to the TIE will resolve the direct effect of exposure features
实现：Aiming to keep generality and leverage the advantages of existing models, the scoring function is implemented in a late-fusion manner。
- Fusion strategy： $Y_{u,i,e}=Y(U=u,I=i,E=e)=f(Y_{u,i},Y_{u,e})=Y_{u,i}*\sigma(Y_{u,e})$ 。
- Recommender training： $L=\sum_{\bar D}l(Y_{u,i,e},\bar y_{u,i})+\alpha*l(Y_{u,e},\bar y_{u,i})$ 。
- Inference via TIE： $Y_{i*,e}(u)=f(c_{u,i},Y_{u,e})$ ， $c_{u,i}=\frac{1}{|I|}\sum Y_{u,i}$ 。 $TIE=(Y_{u,i}-c_{u,i})*\sigma(Y_{u,e})$ 。
对比方法：MMGCN；NT：以click进行训练；CFT：以content特征进行训练；IPW：减去item popularity；CT：以like进行训练；NR：给未click和dislike的item不同权重；RR：在NT的基础上根据like/click重新排序。only clicks with positive post-click feedback such as thumbs-up, favorite, and finishing are used for testing.

在这里插入图片描述

2020-Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback

引用数：来源：Proceedings of the 28th ACM International Conference on Multimedia.

问题：observed interactions with less interested items occur in implicit feedback 。解决方法：adaptively refining the structure of interaction graph to discover and prune potential false-positive edges——a graph refining layer。
方法：
- Graph Refining Layer：假设：the content of item belonging to false-positive interaction is far from the user preference。use Prototypical Network to learn user preference to the content information。prune the noisy edges according to the confidence of edges being the false-positive interactions。
  - Prototypical Network：content signal of item is projected into a metric space to distill the informative features related to the user preference
  - neighbor routing mechanism：Given a user, with the iterative routing operations, her/his representation is adjusted by jointly analyzing her/his similarities to its neighbors.
- Pruning Operations：score the affinity between user preference and item content，integrate the scores of each edge in multiple modalities to yield the weight and assign it to the edge；calculate the relative distances between them in two directions
- Graph Convolutional Layer：treat the graph convolutional operations as the message passing and aggregation
- Prediction Layer：users have varying preferences in different modalities；concatenate the multimodal features and the enriched ID embedding。inner product between user and item representations。
- Optimization：conduct the pair-wise ranking BPR.
实验：数据集：Movielens、Tiktok、Tiktok

2020-Personalized Item Recommendation for Second-hand Trading Platform

引用数：来源：Proceedings of the 28th ACM International Conference on Multimedia

特征：consisting of sufficient interactions per user but rare interactions per item；items in the secondhand trading platform are usually unique,
解决方法：coarse-grained and fine-grained features, and a multi-task learning strategy。category hierarchy information is able to learn more robust visual representations of items。
方法：what s/he actually needs determines the coarse-grained of items (e.g., type, etc.) s/he will interact with. Meanwhile, the fine-grained characteristics (e.g., appearance, condition, etc.) of those items will influence which specific items of in the same type s/he will prefer.
- Item latent representation learning：learn the relationships between adjacent hierarchical categories
- User latent representation learning：learn an embedding vector to model each user’s preference on each hierarchical category
- Multi-task Learning for Item Recommendation：

czlm爱你的笑

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
推荐领域论文阅读

推荐2001-Item-based collaborative filtering recommendation algorithms引用数：9183 来源：Proceedings of the 10th international conference on World Wide Web问题：访问人数越来越多。传统基于user-CF方法的工作量随着用户的增多而增大，通过查找相似用户来进行推荐必然会碰到瓶颈。CF模型将整个用户-物品m×nm\times nm×n评分作为输入矩阵，可分为基于memo
复制链接

扫一扫

专栏目录