Stability of Recommendation Algorithms

这是我读完Stability of Recommendation Algorithms这篇论文所做的笔记,绝非原创只是一些零碎知识的整理。不妥之处还望广大博友积极提出意见!

                     Stability of Recommendation Algorithms


(1)   coverage,diversity, novelty  覆盖率,多样性,新颖性

(2)   perceptionof personalization competence


perception of system’s personalizationcompetence.   感知系统的个性化能力

consistent  predictions  一致的预测

(3)   Stabilityis defined to measure the extent to which a recommendation algorithm providespredictions that are consistent with each other. Specifically, for a stablealgorithm, adding some of the algorithm’s own predictions to the algorithm’straining data (for example, if these predictions were confirmed as accurate byusers) would not invalidate or change the other predictions.



(4)   inherentproperty 固有属性

(5)   thenotion of stability  稳定的概念

(6)   extensionof dataset  扩展的数据集

(7)   aggregatedifference  聚合的区别

(8) According to the definition, the level of instability may depend onseveral factors,

including the type ofpredictive technique T, characteristics of initial rating dataset D,

characteristics of theextension dataset D ? (i.e., a given dataset can have a huge number of possibleextension datasets), and the characteristics of the evaluation dataset U.

根据定义,不稳定性的程度可能取决于几个因素,包括预测技术的类型、初始评级数据集的特征、扩展数据集的特征D (即,一个给定的数据集可以有大量的可能的扩展数据集),以及评估数据集U的特征。

(9)a set of hypothetical (simulated) incoming ratings



(9)   MatrixFactorization  矩阵分解

(10)  simple averaging techniques, user- and item-based variations ofneighborhood-based collaborative filtering approaches, and the model-basedmatrix factorization method


(11)  research literature  研究文献


(12)  some users may systematically tend to give higher ratings thanothers, and some universally liked items might receive higher ratings thanothers. Without normalization, such user and item effects could bias system’spredictions.



(13)  One common practice of normalization suggested in the literatureis to estimate and remove three effects in sequence, that is, overall mean,main effect for items, and main effect for users, and then make predictionsbased on the residuals



(14)  b ui = μ + b u + b i ,

a baseline estimate for each known rating denoted by b ui iscomputed to account for global effects。

where μ is the overall average rating, b u is the observed deviationsof user u, and b i is the observed deviations of item i:

(15)  cross validation  交叉验证


(16)In other words, user-based and item-basedaverages are perfectly stable due to their sum-of-squares-minimizing nature.


(17) This is especially true for stability,suggesting that the stability performance of different recommendationtechniques can be largely attributed to inherent characteristics of thealgorithms rather than data variation in the random samples.


(18)As seen from the experimental results, neighborhood-basedrecommendation algo-

rithms demonstrate thehighest instability among the techniques that were tested


(19)number of latent variables  许多潜在的变量

   be susceptible to   易于受到影响


(20) Robustness Checks for Stability Calculation


(21)random deviation amounts  随机偏差量

      standard deviation  标准差

      positive or negative  正或负

(22)Because of the random perturbations to the newly added ratings, theinstability increase can be observed across all techniques and for alldatasets. However, the inherent stability differences among differenttechniques remain, as evidenced by the same relative stability rankings ofdifferent techniques, that is, with user and item averages demonstratinghighest stability, followed by the model-based approaches (SVD and baseline),while user-based and item-based neighborhood approaches typically exhibit the lowestlevels of stability among all the techniques.


(23)In this section we perform an empirical investigation on whetherdata sparsity (or density) has an impact on the stability as well, by observingthe stability and accuracy of recommendation algorithms

at different data densitylevels.

(24)data sparsity 数据稀疏

(25)inadequate recommender systems accuracy


empirical investigation  实证调查

density levels    密度水平

(26)Our results are consistent with prior literature in that therecommendation algorithms typically demonstrate higher predictive accuracy ondenser rating datasets. More importantly, graphs show that the density level ofthe rating data also has a significant influence on recommendation stabilityfor the neighborhood-based CF techniques (CF Item and CF User) – RMSS increasesas the rating data becomes sparser. In other words, as data becomes sparser,these techniques are increasingly more sensitive to additions of new ratings(even though they are in complete agreement with the algorithms’ ownpredictions). In particular, in our sparsest samples (0.351% rating density,sampled from Netflix), both CF User and CF Item demonstrate a very significant0.57 RMSS (on the rating scale from 1 to 5). Meanwhile, the stabilityperformance of the model-based matrix factorization (SVD) as well as thebaseline (User Item Avg) approaches is much more advantageous and moreconsistent across different density levels, consistently staying in the 0.1−0.15 RMSS range for the movie rating datasets (with the 1−5 rating scale) and in the 0.3−0.4 RMSS range for the Jester dataset (with the scale from −10 to +10).

In summary, the experimental results show that the stability ofthe model-based approaches (SVD, User Item Avg, User Avg, Item Avg) is robustto data density changes, in contrast to the neighborhood-based collaborativefiltering heuristics.

我们的结果与之前的文献一致,推荐算法通常在密度数据集中的预测精度表现得更高。更重要的是,图表显示,评级数据的密度水平对基于邻居的CF技术(CF Item和CFUser)的推荐稳定性也有很大的影响,随着评级数据变得稀疏,RMSS的增加也会增加。换句话说,当数据变得稀疏时,这些技术对新评级的增加变得越来越敏感(尽管它们与算法本身的预测完全一致)。特别是,在稀疏的样品从Netflix(0.351%,评级密度采样),CF用户和CF项目展示非常重要的0.57rms(评级规模从1到5)。与此同时,基于模型的矩阵分解的稳定性能计算)以及基线(用户项Avg)方法更有利和更一致的不同密度水平,始终保持在0.1−0.15 rms范围电影评级数据集(1−5量表)和0.3−0.4 Jester的rms区间数据集(规模从−10 + 10)。

总之,实验结果表明,基于模型的方法的稳定性(SVD、用户项Avg、用户Avg、项目 Avg)对数据密度的变化具有很强的鲁棒性,与基于邻位的协同过滤启发式算法形成了鲜明的对比。

(27)dynamics of recommendation stability


(28)In particular, as expected, we find that the prediction shift forthe simple user- and item-average approaches is always zero, regardless of howmany predictions were added to rating matrix. For all other techniques, includingthe matrix factorization(SVD), user- and item-based collaborative filtering (CFUser and CF Item), and base-line estimates using user and item averages (UserItem Avg), the prediction shift

curves generally indicate aconvex shape. In particular, with only very few newly introduced ratings theprediction shift is small, but the instability rises very rapidly until thenumber of newly introduced ratings reaches about 20% of the original data,atwhich point the rise of the prediction shift slows down and later startsexhibiting a slow continuous decrease.


(29)Our experiments suggest that the stability of memory-basedneighborhood techniques is more sensitive to the number of new incoming ratingsthan the stability of model-based techniques that are based on global optimizationapproaches (including both matrix factorization and simpler average-basedmodels). When only a very small number of new ratings is added, the originalrating patterns persist in the data and

the neighborhood of each user(or item) does not change dramatically, resulting in similar predictions forthe same items (i.e., higher stability). In contrast, when more and more newratings are made available, the neighborhoods can be affected very dramatically.This is supported by the observed more rapid decrease of stability in neighborhood-basedtechniques as compared to the model-based approaches, which are based on globaloptimization and, thus, are less malleable to additions of new ratings (thatare in agreement of what the algorithm had predicted earlier).


(30)In addition, it is important to note that, after the initialincrease in prediction shift, its subsequent slow decrease (for allrecommendation algorithms) can be attributed to the sheer numbers of additionalnew ratings that are introduced, all of which are in agreement with the initialrecommendation model. Whether the recommendation algorithm is morecomputationally sophisticated (e.g., SVD) or less (e.g., CF User), providing itwith increasingly more data that is in consistent agreement with some specificmodel will make this algorithm represent this model better and make more stablerecommendations.



(31)Random strategy draws a sample of predictions from each user atrandom to be added as new incoming ratings to the set of original ratings ;High strategy sorts all predictions for each user and only chooses those withhighest predicted rating values; HighHalf sorts predictions for each user andthen draws a random sample of predictions with values greater than the medianprediction; Low sorts predictions for each user and only adds lowest predictions; and LowHalf sorts predictions and draws a random sample from ratings whosevalues are lower than the median prediction. In summary, we wanted to test whether adding skewed rating samples willbias the original rating distribution and, as a result, decrease recommendationalgorithm stability, as compared to the random samples of new predictions.


(32)As can be seen from the figure,the distribution of new incomingratings significantly influences the stability of recommendation algorithms,except for simple user- and item-based average techniques, which are alwaysperfectly stable, as discussed earlier. In particular,among the five sampling strategies, Random strategy demonstrated the higheststability for nearly all recommendation algorithms. Moreover, allrecommendation algorithms exhibited an increase in instability with High or Lowsampling strategies, as compared to the Random strategy, and this differencewas especially substantial for memory-based neighborhood CF techniques.Furthermore, adding new ratings selected by HighHalf and LowHalf strategies ledto moderate prediction shift for all algorithms. In summary, Random ratingsamples that are in complete agreement with previous predictions have morefavorable impact on stability than samples with skewed distribution.


(33)More specifically, the more skewed the distribution of new incomingratings is, the less stable the recommendation algorithms become.


(34)Our results also suggest that the impact of adding skewed samples ofnew ratings on recommendation stability can be asymmetric for some algorithmson some datasets。



(35)Impact of Data Normalization  数据标准化的影响

(36)The results suggest that normalizing rating data in general canimprove predictive accuracy for all recommendation algorithms. Meanwhile,normalization also impacts the stability of these algorithms in different ways.


However, normalization byremoving just the item average or user average dramatically improved bothaccuracy and stability of neighborhood-based approaches as compared to theirnonnormalized versions.


On top of this, removing allthe global effects resulted in a comparable accuracy and only very slightstability improvement, as compared to removing only one main effect. Moreover,the impacts of normalization on stability and accuracy are consistent acrossall datasets.


(37)Impact of Evaluation Data Distribution


The main finding is that the traditionally more stableapproaches demonstrate the same or very similar stability levels across the entireevaluation data distribution.


In the dynamical systems subfield 在动力系统子场中

Subsample-based stability of classification models


(38)In the machine learning literature, the stability of a predictivealgorithm is the degree to which it generates repeatable results, givendifferent subsamples of the entire dataset。


(39)Turney describes that “[t]he instability of the algorithm is thesensitivity of the algorithm to noise in the data. Instability is closelyrelated to our intuitive notion of complexity. Complex models tend to beunstable and simple models tend to be stable.”


(40)Attack detection in recommender systems



Because recommender systemsdepend heavily on input from users, they are subject to manipulations andattacks。



A.Theresults of our experiments show that model-based techniques (e.g., matrixfactorization, user average, item average, and baseline estimates usingcombined user and item averages) are regularly more stable, that is, moreconsistent in their predictions, than memory-based collaborative filteringheuristics in a wide variety of settings. We also find that normalizing ratingdata before applying any algorithms not only improves accuracy for allrecommendation algorithms, but also plays a critical role in improving theirstability.


B.Wealso empirically show that stability of a recommendation algorithm does notnecessarily correlate with its predictive accuracy. Recommendation algorithmscan be relatively accurate but not stable ,or stable but not accurate , or both, or neither.


C.Developingstability-aware or stability-maximizing recommendation techniques as well asperforming user studies that measure the impact of instability on users’ trustand acceptance represents yet other important directions for future work.



Factors affecting thestability of recommendation algorithms影响推荐算法稳定性的因素

Data Sparsity   数据稀疏

Number of New Ratings Added  添加新评级的数量

New Rating Distribution  新评级的分布

Data Normalization    数据标准化

Evaluation Data Distribution   评估数据标准化





当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


