Stability of Recommendation Algorithms

最新推荐文章于 2022-03-01 01:11:07 发布

巨兔12306

最新推荐文章于 2022-03-01 01:11:07 发布

阅读量542

点赞数

分类专栏：算法研究文章标签： Stability of Recomme

算法研究专栏收录该内容

5 篇文章 0 订阅

订阅专栏

这是我读完Stability of Recommendation Algorithms这篇论文所做的笔记，绝非原创只是一些零碎知识的整理。不妥之处还望广大博友积极提出意见！

Stability of Recommendation Algorithms

推荐算法的稳定性

（1） coverage,diversity, novelty 覆盖率，多样性，新颖性

（2） perceptionof personalization competence

对个性化的感知能力

perception of system’s personalizationcompetence. 感知系统的个性化能力

consistent predictions 一致的预测

（3） Stabilityis defined to measure the extent to which a recommendation algorithm providespredictions that are consistent with each other. Specifically, for a stablealgorithm, adding some of the algorithm’s own predictions to the algorithm’straining data (for example, if these predictions were confirmed as accurate byusers) would not invalidate or change the other predictions.

稳定性是用来衡量推荐算法在多大程度上提供了一致的预测。具体来说，对于一个稳定的算法，将算法自身的一些预测添加到算法的训练数据中(例如，如果这些预测被用户确认为准确的话)不会使其他预测失效或改变。

（4） inherentproperty 固有属性

（5） thenotion of stability 稳定的概念

（6） extensionof dataset 扩展的数据集

（7） aggregatedifference 聚合的区别

（8） According to the definition, the level of instability may depend onseveral factors,

including the type ofpredictive technique T, characteristics of initial rating dataset D,

characteristics of theextension dataset D ? (i.e., a given dataset can have a huge number of possibleextension datasets), and the characteristics of the evaluation dataset U.

根据定义，不稳定性的程度可能取决于几个因素，包括预测技术的类型、初始评级数据集的特征、扩展数据集的特征D (即，一个给定的数据集可以有大量的可能的扩展数据集)，以及评估数据集U的特征。

（9）a set of hypothetical (simulated) incoming ratings

一套假设（模拟）即将到来的评级

（8） EXPERIMENTALSETUP 实验装置

（9） MatrixFactorization 矩阵分解

（10） simple averaging techniques, user- and item-based variations ofneighborhood-based collaborative filtering approaches, and the model-basedmatrix factorization method

简单的平均技术、基于用户的和基于项目的基于邻居的协同过滤方法的变化，以及基于模型的矩阵分解方法

（11） research literature 研究文献

正如研究文献中经常提到的那样，在应用任何预测技术之前，通过消除用户和物品的影响来规范评级数据是很有用的

（12） some users may systematically tend to give higher ratings thanothers, and some universally liked items might receive higher ratings thanothers. Without normalization, such user and item effects could bias system’spredictions.

例如，一些用户可能有系统地倾向于给予高于其他用户的评分，而一些普遍喜欢的项目可能会获得比其他人更高的评级。如果没有归一化，这样的用户和项目的影响可能会导致系统的预测。

（13） One common practice of normalization suggested in the literatureis to estimate and remove three effects in sequence, that is, overall mean,main effect for items, and main effect for users, and then make predictionsbased on the residuals

文献中所建议的一种标准化的常见做法是，对序列中的三个效应进行估计和删除，即总体均值、对项目的主要影响，以及对用户的主要影响，然后根据残差做出预测

（14） b ui = μ + b u + b i ,

a baseline estimate for each known rating denoted by b ui iscomputed to account for global effects。

where μ is the overall average rating, b u is the observed deviationsof user u, and b i is the observed deviations of item i:

（15） cross validation 交叉验证

（16）In other words, user-based and item-basedaverages are perfectly stable due to their sum-of-squares-minimizing nature.

换句话说，基于用户和基于项的平均水平是完全稳定的，因为它们的平方——最小化自然，如第2节所讨论的。

（17） This is especially true for stability,suggesting that the stability performance of different recommendationtechniques can be largely attributed to inherent characteristics of thealgorithms rather than data variation in the random samples.

这对于稳定性来说尤其如此，这表明不同的推荐技术的稳定性性能很大程度上归因于算法的内在特性，而不是随机样本中的数据变化。

（18）As seen from the experimental results, neighborhood-basedrecommendation algo-

rithms demonstrate thehighest instability among the techniques that were tested

从实验结果来看，基于邻居的推荐算法在测试的技术中表现出最高的不稳定性

（19）number of latent variables 许多潜在的变量

be susceptible to 易于受到影响

(20) Robustness Checks for Stability Calculation

对于稳定性进行健壮性检查

（21）random deviation amounts 随机偏差量

standard deviation 标准差

positive or negative 正或负

（22）Because of the random perturbations to the newly added ratings, theinstability increase can be observed across all techniques and for alldatasets. However, the inherent stability differences among differenttechniques remain, as evidenced by the same relative stability rankings ofdifferent techniques, that is, with user and item averages demonstratinghighest stability, followed by the model-based approaches (SVD and baseline),while user-based and item-based neighborhood approaches typically exhibit the lowestlevels of stability among all the techniques.

由于对新添加的评级的随机扰动，可以在所有技术和所有数据集上观察不稳定增长。然而，不同技术之间的固有稳定性差异仍然存在，与不同技术的相对稳定性排名相同，即用户和项目平均值显示最高的稳定性，其次是基于模型的方法(SVD和基线)，而基于用户的方法和基于项目的社区方法通常表现出所有技术中最低的稳定性。

（23）In this section we perform an empirical investigation on whetherdata sparsity (or density) has an impact on the stability as well, by observingthe stability and accuracy of recommendation algorithms

at different data densitylevels.

（24）data sparsity 数据稀疏

（25）inadequate recommender systems accuracy

推荐系统精度不高

empirical investigation 实证调查

density levels 密度水平

（26）Our results are consistent with prior literature in that therecommendation algorithms typically demonstrate higher predictive accuracy ondenser rating datasets. More importantly, graphs show that the density level ofthe rating data also has a significant influence on recommendation stabilityfor the neighborhood-based CF techniques (CF Item and CF User) – RMSS increasesas the rating data becomes sparser. In other words, as data becomes sparser,these techniques are increasingly more sensitive to additions of new ratings(even though they are in complete agreement with the algorithms’ ownpredictions). In particular, in our sparsest samples (0.351% rating density,sampled from Netflix), both CF User and CF Item demonstrate a very significant0.57 RMSS (on the rating scale from 1 to 5). Meanwhile, the stabilityperformance of the model-based matrix factorization (SVD) as well as thebaseline (User Item Avg) approaches is much more advantageous and moreconsistent across different density levels, consistently staying in the 0.1−0.15 RMSS range for the movie rating datasets (with the 1−5 rating scale) and in the 0.3−0.4 RMSS range for the Jester dataset (with the scale from −10 to +10).

In summary, the experimental results show that the stability ofthe model-based approaches (SVD, User Item Avg, User Avg, Item Avg) is robustto data density changes, in contrast to the neighborhood-based collaborativefiltering heuristics.

我们的结果与之前的文献一致，推荐算法通常在密度数据集中的预测精度表现得更高。更重要的是，图表显示，评级数据的密度水平对基于邻居的CF技术(CF Item和CFUser)的推荐稳定性也有很大的影响，随着评级数据变得稀疏，RMSS的增加也会增加。换句话说，当数据变得稀疏时，这些技术对新评级的增加变得越来越敏感(尽管它们与算法本身的预测完全一致)。特别是,在稀疏的样品从Netflix(0.351%,评级密度采样),CF用户和CF项目展示非常重要的0.57rms(评级规模从1到5)。与此同时,基于模型的矩阵分解的稳定性能计算)以及基线(用户项Avg)方法更有利和更一致的不同密度水平,始终保持在0.1−0.15 rms范围电影评级数据集(1−5量表)和0.3−0.4 Jester的rms区间数据集(规模从−10 + 10)。

总之，实验结果表明，基于模型的方法的稳定性(SVD、用户项Avg、用户Avg、项目 Avg)对数据密度的变化具有很强的鲁棒性，与基于邻位的协同过滤启发式算法形成了鲜明的对比。

（27）dynamics of recommendation stability

动态稳定性

（28）In particular, as expected, we find that the prediction shift forthe simple user- and item-average approaches is always zero, regardless of howmany predictions were added to rating matrix. For all other techniques, includingthe matrix factorization(SVD), user- and item-based collaborative filtering (CFUser and CF Item), and base-line estimates using user and item averages (UserItem Avg), the prediction shift

curves generally indicate aconvex shape. In particular, with only very few newly introduced ratings theprediction shift is small, but the instability rises very rapidly until thenumber of newly introduced ratings reaches about 20% of the original data,atwhich point the rise of the prediction shift slows down and later startsexhibiting a slow continuous decrease.

特别地，正如预期的那样，我们发现简单用户和项目平均方法的预测变化总是为零，不管有多少预测添加到评级矩阵。对于所有其他技术，包括矩阵分解(SVD)、用户和基于项目的协同过滤(CF用户和CF项)和基础使用用户和项目平均(用户项Avg)的线估计，预测位移曲线通常表示一个凸形。特别是，只有很少的新生命对预测偏移的预测是很小的，但是不稳定上升非常迅速，直到新引入的收视率达到原始数据的20%左右，此时预测的上升速度减慢，随后开始出现缓慢的持续下降。

（29）Our experiments suggest that the stability of memory-basedneighborhood techniques is more sensitive to the number of new incoming ratingsthan the stability of model-based techniques that are based on global optimizationapproaches (including both matrix factorization and simpler average-basedmodels). When only a very small number of new ratings is added, the originalrating patterns persist in the data and

the neighborhood of each user(or item) does not change dramatically, resulting in similar predictions forthe same items (i.e., higher stability). In contrast, when more and more newratings are made available, the neighborhoods can be affected very dramatically.This is supported by the observed more rapid decrease of stability in neighborhood-basedtechniques as compared to the model-based approaches, which are based on globaloptimization and, thus, are less malleable to additions of new ratings (thatare in agreement of what the algorithm had predicted earlier).

我们的实验表明，基于内存的邻域技术的稳定性比基于模型的技术更敏感，基于模型的技术是基于全局优化方法(包括矩阵分解和简单的基于平均的模型)。当只添加非常少量的新评级时，原始的评级模式仍然存在于数据中，每个用户(或项目)的邻居不会发生显著的变化，从而导致对相同项的类似预测(即:更高的稳定性)。相反,当越来越多的新评级被提供时，邻域可能会受到极大的影响。与基于模型的方法相比，在基于全局优化的基于模型的方法中，与基于模型的方法相比，这种方法更快速地降低了稳定性，因此，对于新评级的增加(这与算法之前所预测的一致)，这一点得到了更少的可伸缩性。

（30）In addition, it is important to note that, after the initialincrease in prediction shift, its subsequent slow decrease (for allrecommendation algorithms) can be attributed to the sheer numbers of additionalnew ratings that are introduced, all of which are in agreement with the initialrecommendation model. Whether the recommendation algorithm is morecomputationally sophisticated (e.g., SVD) or less (e.g., CF User), providing itwith increasingly more data that is in consistent agreement with some specificmodel will make this algorithm represent this model better and make more stablerecommendations.

此外，值得注意的是，在预测变化量的初始增加之后，预测变化随后的缓慢下降(对于所有的推荐算法)都可以被归因于为了增加新评级的数量，所有这些都与最初的建议模型一致。不管推荐算法是否会计算得更复杂或更简单,人们为它提供越来越多与一些特定的模型数据的一致的协议将使该算法代表这个模型更好和更稳定的建议。

（31）Random strategy draws a sample of predictions from each user atrandom to be added as new incoming ratings to the set of original ratings ;High strategy sorts all predictions for each user and only chooses those withhighest predicted rating values; HighHalf sorts predictions for each user andthen draws a random sample of predictions with values greater than the medianprediction; Low sorts predictions for each user and only adds lowest predictions; and LowHalf sorts predictions and draws a random sample from ratings whosevalues are lower than the median prediction. In summary, we wanted to test whether adding skewed rating samples willbias the original rating distribution and, as a result, decrease recommendationalgorithm stability, as compared to the random samples of new predictions.

随机策略从每个用户随机抽取一个预测样本，将其作为新传入的评分添加到原始评级的集合中;高策略对每个用户进行各种预测，只选择具有最高预测值的用户;对每个用户进行HighHalf排序，然后随机抽取一个比中值预测值更大的预测样本;对每个用户进行低排序预测，只增加最低的预测;而LowHalf对预测进行了排序，并从那些值低于中值预测的评级中抽取了一个随机样本。综上所述，我们想要测试的是，添加倾斜的评级样本是否会对原有的评级分布产生偏差，从而降低推荐算法的稳定性，与新的预测的随机样本相比。

（32）As can be seen from the figure,the distribution of new incomingratings significantly influences the stability of recommendation algorithms,except for simple user- and item-based average techniques, which are alwaysperfectly stable, as discussed earlier. In particular,among the five sampling strategies, Random strategy demonstrated the higheststability for nearly all recommendation algorithms. Moreover, allrecommendation algorithms exhibited an increase in instability with High or Lowsampling strategies, as compared to the Random strategy, and this differencewas especially substantial for memory-based neighborhood CF techniques.Furthermore, adding new ratings selected by HighHalf and LowHalf strategies ledto moderate prediction shift for all algorithms. In summary, Random ratingsamples that are in complete agreement with previous predictions have morefavorable impact on stability than samples with skewed distribution.

从图中可以看出，新传入的评分分布对推荐算法的稳定性有很大的影响，正如前面所讨论的，除了简单的用户和基于项目的平均技术，这些技术总是非常稳定。特别是在五个抽样策略中，随机策略显示了几乎所有推荐算法的最高稳定性。此外，与随机策略相比，所有的推荐算法都表现出了不稳定性的增加和高或减少的抽样策略，而这种差异对于基于内存的邻域CF技术尤其重要。此外，添加了由HighHalf和LowHalf策略所选择的新评级，还导致了所有算法的预测偏移。综上所述，与以往预测完全一致的随机评级样本比那些具有倾斜分布的样本所受到的影响更大。

（33）More specifically, the more skewed the distribution of new incomingratings is, the less stable the recommendation algorithms become.

更具体地说，新传入的评级分布越倾斜，推荐算法就越不稳定。

（34）Our results also suggest that the impact of adding skewed samples ofnew ratings on recommendation stability can be asymmetric for some algorithmson some datasets。

我们的研究结果还表明，在某些数据集上，添加倾斜的新评级样本对推荐稳定性的影响是不对称的。

（35）Impact of Data Normalization 数据标准化的影响

（36）The results suggest that normalizing rating data in general canimprove predictive accuracy for all recommendation algorithms. Meanwhile,normalization also impacts the stability of these algorithms in different ways.

结果表明，通常规范评级数据可以提高所有推荐算法的预测精度。同时，规范化也会以不同的方式影响这些算法的稳定性。

However, normalization byremoving just the item average or user average dramatically improved bothaccuracy and stability of neighborhood-based approaches as compared to theirnonnormalized versions.

然而，通过删除项目平均或用户平均水平的标准化，与非规范化版本相比，基于邻域的算法的准确性和稳定性都有了显著提高。

On top of this, removing allthe global effects resulted in a comparable accuracy and only very slightstability improvement, as compared to removing only one main effect. Moreover,the impacts of normalization on stability and accuracy are consistent acrossall datasets.

最重要的是，除去所有的全球影响，与除去一个主要的效果相比，结果是一个相当的准确性，只有非常轻微的稳定性改善。此外，标准化的影响在稳定性和准确性上，所有数据集都是一致的。

（37）Impact of Evaluation Data Distribution

评价数据分布的影响

The main finding is that the traditionally more stableapproaches demonstrate the same or very similar stability levels across the entireevaluation data distribution.

主要的发现是，传统上较为稳定的方法在整个评估数据分布中表现出相同或非常相似的稳定性水平。

In the dynamical systems subfield 在动力系统子场中

Subsample-based stability of classification models

分类模型的基于子类的稳定性

（38）In the machine learning literature, the stability of a predictivealgorithm is the degree to which it generates repeatable results, givendifferent subsamples of the entire dataset。

在机器学习文献中，一个预测算法的稳定性是在给定整个数据集的子样本的情况下，该算法所能产生可重复的结果的程度。

（39）Turney describes that “[t]he instability of the algorithm is thesensitivity of the algorithm to noise in the data. Instability is closelyrelated to our intuitive notion of complexity. Complex models tend to beunstable and simple models tend to be stable.”

Turney描述说，算法的不稳定性是算法对数据噪声的敏感性。不稳定与我们对复杂性的直觉概念密切相关。复杂的模型往往是不稳定的，简单的模型往往是稳定的。

（40）Attack detection in recommender systems

推荐系统中的攻击检测

EXPERIMENTAL SETUP 实验装置

Because recommender systemsdepend heavily on input from users, they are subject to manipulations andattacks。

因为推荐系统很大程度上依赖于用户的输入，所以他们受到操控和攻击。

（41）Concliusion:

A.Theresults of our experiments show that model-based techniques (e.g., matrixfactorization, user average, item average, and baseline estimates usingcombined user and item averages) are regularly more stable, that is, moreconsistent in their predictions, than memory-based collaborative filteringheuristics in a wide variety of settings. We also find that normalizing ratingdata before applying any algorithms not only improves accuracy for allrecommendation algorithms, but also plays a critical role in improving theirstability.

我们的实验结果表明，基于模型的技术(如:矩阵分解、用户平均、项目平均值以及使用组合用户和项目平均值的基线估计都比较稳定，这在他们的预测中比基于内存的协同过滤启发式算法更加一致。我们还发现，在应用任何算法之前，规范化的评级数据不仅提高了所有推荐算法的准确性，而且在提高稳定性方面也发挥了关键作用。

B.Wealso empirically show that stability of a recommendation algorithm does notnecessarily correlate with its predictive accuracy. Recommendation algorithmscan be relatively accurate but not stable ,or stable but not accurate , or both, or neither.

我们还从经验上证明，推荐算法的稳定性与它的预测精度不一定相关。推荐算法可以相对准确，但不稳定，或稳定，但不准确，或两者都不准确。

C.Developingstability-aware or stability-maximizing recommendation techniques as well asperforming user studies that measure the impact of instability on users’ trustand acceptance represents yet other important directions for future work.

开发稳定的或稳定的推荐技术，以及执行用户研究，测量不稳定对用户信任和接受程度的影响，这是未来工作的其他重要方向。

（42）

Factors affecting thestability of recommendation algorithms影响推荐算法稳定性的因素

Data Sparsity 数据稀疏

Number of New Ratings Added 添加新评级的数量

New Rating Distribution 新评级的分布

Data Normalization 数据标准化

Evaluation Data Distribution 评估数据标准化