2022
一些论文与使用的指标
《Deep Collaborative Filtering with Multi-Aspect Information in Heterogeneous Networks》-石川et al.:HR@K, NDCG@K
《Modeling User Exposure in Recommendation》:Recall@K, MAP@K, NDCG@K
《Unbiased offline recommender evaluation for missing-not-at-random implicit feedback》:AUC, Recall@K, NDCG@K
《Recommendations as Treatments: Debiasing Learning and Evaluation》:MAE, MSE, CG, DCG, Precision@K
《Deep Session Interest Network for Click-Through Rate Prediction》:AUC
一.评分
1.RMSE(Root Mean Squard Error)、MAE(Mean Absolute Error)
![image.png](https://s2.loli.net/2022/02/10/Lk65EFSq1AeyMZu.png)
![image.png](https://s2.loli.net/2022/02/10/ouiTJOVfskenx2Y.png)
二.推荐列表
1.准确率(Precision)和召回率(Recall)
对用户u推荐K个物品(记为R(u)),令用户u在测试集上喜欢的物品集合为T(u),然后可以通过准确率/召回率评测推荐算法的精度:
P
r
e
c
i
s
i
o
n
@
K
=
∑
u
∣
R
(
u
)
∩
T
(
u
)
∣
∑
u
∣
R
(
u
)
∣
Precision@K=\frac{\displaystyle \sum^{}_{u} |R(u)\cap T(u)|}{\displaystyle \sum^{}_{u}|R(u)|}
Precision@K=u∑∣R(u)∣u∑∣R(u)∩T(u)∣
R e c a l l @ K = ∑ u ∣ R ( u ) ∩ T ( u ) ∣ ∑ u ∣ T ( u ) ∣ Recall@K=\frac{\displaystyle \sum^{}_{u} |R(u)\cap T(u)|}{\displaystyle \sum^{}_{u}|T(u)|} Recall@K=u∑∣T(u)∣u∑∣R(u)∩T(u)∣
F = 2 P r e c i s i o n @ K ⋅ R e c a l l @ K P r e c i s i o n @ K + R e c a l l @ K F = \frac{2Precision@K·Recall@K}{Precision@K+Recall@K} F=Precision@K+Recall@K2Precision@K⋅Recall@K
2.命中率HR(Hits Ratio)
与召回类似。意义:关心用户想要的,有没有推荐到,强调预测的“准确性”
《Deep Collaborative Filtering with Multi-Aspect Information in Heterogeneous Networks》-石川et al.
![image.png](https://s2.loli.net/2022/02/10/JCiSHyxXzTdtGvU.png)
三.排序
1.归一化折损累计增益(Normalized Discounted Cumulative Gain,NDCG)
NDCG:《Modeling User Exposure in Recommendation》
![image.png](https://s2.loli.net/2022/02/10/iwnAMzcEIxoaNg3.png)
意义:关心找到的这些项目,是否放在用户更显眼的位置里,即强调“顺序性“
C
G
=
∑
j
=
1
K
r
e
l
j
CG = \displaystyle \sum^{K}_{j=1} rel_j
CG=j=1∑Krelj
D C G u = ∑ j = 1 K 2 r e l j − 1 log 2 ( j + 1 ) DCG_u = \displaystyle \sum^{K}_{j=1} {\frac{2^{rel_j}-1}{\log_2(j+1)}} DCGu=j=1∑Klog2(j+1)2relj−1
N D C G u = D C G u I D C G u NDCG_u = {\frac{DCG_u}{IDCG_u}} NDCGu=IDCGuDCGu
N D C G @ K = 1 N ∑ u = 1 N N D C G u NDCG@K = \frac{1}{N}\displaystyle \sum^{N}_{u=1} NDCG_u NDCG@K=N1u=1∑NNDCGu
隐式反馈topK:
![image.png](https://s2.loli.net/2022/02/10/DlyhfWXQ7ZL1qCS.png)
简化版:《Deep Collaborative Filtering with Multi-Aspect Information in Heterogeneous Networks》-石川et al. 每个用户的测试集只放一个记录。
N
D
C
G
=
1
N
∑
i
=
1
N
1
log
2
(
p
i
+
1
)
NDCG = \frac{1}{N}\displaystyle \sum^{N}_{i=1} {\frac{1}{\log_2(p_i+1)}}
NDCG=N1i=1∑Nlog2(pi+1)1
N:用户的总数量
p i p_i pi:第 i i i个用户的真实访问值在推荐列表的位置,若推荐列表不存在该值,则 p i → ∞ p_i \to \infty pi→∞
2.平均倒数排名(Mean Reciprocal Rank, MRR)
指第一个正确答案在topK推荐列表里的排名的倒数
M
R
R
=
1
N
∑
i
=
1
N
1
p
i
MRR = \frac{1}{N} \displaystyle \sum^{N}_{i=1} \frac{1}{p_i}
MRR=N1i=1∑Npi1
3.MAP(Mean Average Precision,平均准确率)
可理解为考虑了顺序的召回率。
首先需要计算每个用户AP(Average Precision):
![image.png](https://s2.loli.net/2022/02/10/Al4xS9GNwdJjC1u.png)
MAP就是所有用户AP的平均值
![image.png](https://s2.loli.net/2022/02/10/ZvaNBolwW4RTKLe.png)
4.AUC
点击率等场景
一种朴素算法:
![image.png](https://s2.loli.net/2022/02/10/Ca7vB9KPq2x6bos.png)
还可用其他方法降低时间复杂度,例如先排序(按照得分升序排列),再分析每个正例的位置与其得分大于负例数的数量关系。易得某个用户的AUC为:
其中n1为正样本的个数,n0为负样本的个数,rank(i)为第i个正样本的rank值
再计算所有用户的AUC值即可:
e.g K=5
R=[[2,5,1,3,9], [6,2,0,12,8], [1,6,7,11,2]]
T=[[3,10,7,21], [15,0,5,2,13], [19]]
P
r
e
c
i
s
i
o
n
@
5
=
1
+
2
+
0
5
+
5
+
5
=
3
15
=
0.2
Precision@5 =\frac{1+2+0}{5+5+5}=\frac{3}{15}=0.2
Precision@5=5+5+51+2+0=153=0.2
R e c a l l @ 5 = 1 + 2 + 0 4 + 5 + 1 = 3 10 = 0.3 Recall@5=\frac{1+2+0}{4+5+1}=\frac{3}{10}=0.3 Recall@5=4+5+11+2+0=103=0.3
F 1 @ 5 = 2 ∗ 0.2 ∗ 0.3 0.2 + 0.3 = 0.12 0.5 = 0.24 F1@5=\frac{2*0.2*0.3}{0.2+0.3}=\frac{0.12}{0.5}=0.24 F1@5=0.2+0.32∗0.2∗0.3=0.50.12=0.24
R=[[2,5,1,3,9], [6,2,0,12,8], [1,6,7,11,2]]
T=[[3,10,7,21], [15,0,5,2,13], [19]]
H
R
@
5
=
1
+
1
+
0
3
=
2
3
≈
0.667
HR@5=\frac{1+1+0}{3}=\frac{2}{3}\approx0.667
HR@5=31+1+0=32≈0.667
M R R @ 5 = 1 4 + 1 2 + 0 3 = 1 4 = 0.25 MRR@5= \frac{\frac{1}{4} + \frac{1}{2} + 0}{3}= \frac{1}{4}=0.25 MRR@5=341+21+0=41=0.25
M A P @ 5 = 1 3 ∗ ( 1 4 4 + 1 2 + 2 3 5 + 0 1 ) ≈ 0.10 MAP@5= \frac{1}{3}* (\frac{\frac{1}{4}}{4} + \frac{\frac{1}{2}+\frac{2}{3}}{5} + \frac{0}{1} )\approx0.10 MAP@5=31∗(441+521+32+10)≈0.10
N D C G @ 5 = 1 3 ∗ ( 1 l o g 2 ( 4 + 1 ) 1 l o g 2 ( 1 + 1 ) + 1 l o g 2 ( 2 + 1 ) + 1 l o g 2 ( 3 + 1 ) 1 l o g 2 ( 1 + 1 ) + 1 l o g 2 ( 2 + 1 ) + 0 ) = 1 3 ∗ ( l o g 2 2 l o g 2 5 + l o g 2 3 + 2 2 ∗ ( 1 + l o g 2 3 ) + 0 ) ≈ 0.37470 NDCG@5=\frac{1}{3}*(\frac{\frac{1}{log_2{(4+1)}}{}}{\frac{1}{log_2{(1+1)}}}+\frac{\frac{1}{log_2{(2+1)}}+\frac{1}{log_2{(3+1)}}}{\frac{1}{log_2{(1+1)}}+\frac{1}{log_2{(2+1)}}}+0)=\frac13*(\frac{log_22}{log_25}+\frac{log_23+2}{2*(1+log_23)}+0)\approx0.37470 NDCG@5=31∗(log2(1+1)1log2(4+1)1+log2(1+1)1+log2(2+1)1log2(2+1)1+log2(3+1)1+0)=31∗(log25log22+2∗(1+log23)log23+2+0)≈0.37470
评分类的指标越小越好,推荐列表和排序类的指标范围在0~1,越接近于1则越好。