推荐系统中常用评价指标及其实现

推荐系统中常用评价指标及其实现

定义

0 符号系统

符号含义备注
K, kTop-K 推荐中的 K 值, 比如 Top-5 表示给每个用户推荐 5 个物品
U U U用户总数量
I I I物品总数量
u u u代指一个用户
i i i代指一个物品
R ( u ) \mathcal{R}(u) R(u)给用户 u u u 推荐的物品列表
T ( u ) \mathcal{T}(u) T(u)用户 u u u 的真实交互列表

1 评分指标

1.1 平均绝对误差(Mean Absolute Error, MAE)
1.2 均方误差(Mean Squared Error)
1.2 均方根误差(Root Mean Absolute Error, RMSE)

2 准确性指标

2.1 召回率(Recall)

Recall 表示推荐的列表中预测正确的占总体的比例.
R e c a l l @ K = 1 U ∑ u = 1 U ∣ R ( u ) ∩ T ( u ) ∣ ∣ T ( u ) ∣ \mathrm{Recall}@K=\frac{1}{U}\sum_{u=1}^{U}\frac{\mid\mathcal{R}(u)\cap\mathcal{T}(u)\mid}{\mid\mathcal{T}(u)\mid} Recall@K=U1u=1UT(u)R(u)T(u)

2.2 精确度(Precision)

Precision 表示推荐的列表中有多少是正确的.
P r e c i s i o n @ K = 1 U ∑ u = 1 U ∣ R ( u ) ∩ T ( u ) ∣ ∣ R ( u ) ∣ = 1 U ∑ u = 1 U ∣ R ( u ) ∩ T ( u ) ∣ K \mathrm{Precision}@K=\frac{1}{U}\sum_{u=1}^{U}\frac{\mid\mathcal{R}(u)\cap\mathcal{T}(u)\mid}{\mid\mathcal{R}(u)\mid}=\frac{1}{U}\sum_{u=1}^{U}\frac{\mid\mathcal{R}(u)\cap\mathcal{T}(u)\mid}{K} Precision@K=U1u=1UR(u)R(u)T(u)=U1u=1UKR(u)T(u)

2.3 F-score

F-score 可以平衡 Recall 和 Precision 指标, 反应两种指标的情况.
F β = ( 1 + β 2 ) × P r e c i s i o n @ K × R e c a l l @ K β 2 × P r e c i s i o n @ K + R e c a l l @ K \mathrm{F}_{\beta}=\frac{(1+\beta^2)\times\mathrm{Precision}@K\times\mathrm{Recall}@K}{\beta^2\times\mathrm{Precision}@K+\mathrm{Recall}@K} Fβ=β2×Precision@K+Recall@K(1+β2)×Precision@K×Recall@K

3 排名指标

3.1 命中率(Hit Ratio, HR)

HR 表示推荐列表中至少有一个物品命中的比例.

H R @ K = 1 U ∑ u = 1 U h r ( u ) h r ( u ) = { 1 , R ( u ) ∩ T ( u ) ≠ ∅ 0 , R ( u ) ∩ T ( u ) = ∅ \mathrm{HR}@K=\frac{1}{U}\sum_{u=1}^{U}\mathrm{hr}(u)\\ \mathrm{hr}(u)=\left\{ \begin{aligned} &1,\mathcal{R}(u)\cap\mathcal{T}(u)\neq\varnothing\\ &0,\mathcal{R}(u)\cap\mathcal{T}(u)=\varnothing \end{aligned} \right. HR@K=U1u=1Uhr(u)hr(u)={1,R(u)T(u)=0,R(u)T(u)=

3.2 平均倒数排名(Mean Reciprocal Rank, MRR)

M R R @ K = 1 U ∑ u = 1 U 1 r a n k ( u ) \mathrm{MRR}@K=\frac{1}{U}\sum_{u=1}^{U}\frac{1}{\mathrm{rank}(u)} MRR@K=U1u=1Urank(u)1

r a n k ( u ) \mathrm{rank}(u) rank(u)表示对用户 u u u 的推荐中( R ( u ) \mathcal{R}(u) R(u)), 第一个命中的项目在推荐列表( R ( u ) \mathcal{R}(u) R(u))中的次序, 若没有命中, 则 r a n k ( u ) → ∞ \mathrm{rank}(u)\to\infty rank(u)​.

MRR calculation example

3.3 Mean Average Precision (MAP)

M A P @ K = 1 U ∑ u = 1 U A P @ K u A P @ K u = 1 ∣ R ( u ) ∩ T ( u ) ∣ ∑ k = 1 K P r e c i s i o n ( k ) × r e l ( k ) \mathrm{MAP}@K=\frac{1}{U}\sum_{u=1}^{U}{\mathrm{AP}@K}_{u}\\ {\mathrm{AP}@K}_{u}=\frac{1}{\mid\mathcal{R}(u)\cap\mathcal{T}(u)\mid}\sum_{k=1}^{K}\mathrm{Precision}(k)\times\mathrm{rel}(k) MAP@K=U1u=1UAP@KuAP@Ku=R(u)T(u)1k=1KPrecision(k)×rel(k)

P r e c i s i o n ( k ) \mathrm{Precision}(k) Precision(k): 计算用户 u u u 的推荐列表中的第 k k k 个元素位置的 P r e c i s i o n @ k \mathrm{Precision}@k Precision@k.

r e l ( k ) \mathrm{rel}(k) rel(k): 当用户 u u u 的推荐列表中的第 k k k 个元素命中时 r e l ( k ) = 1 rel(k)=1 rel(k)=1, 否则 r e l ( k ) = 0 rel(k)=0 rel(k)=0​.

Precision at K example

Average precision example

Average precision example

3.4 归一化折损累计增益(Normalized Discounted Cumulative Gain, NDCG)

N D C G @ K = 1 U ∑ u = 1 U D C G @ K u I D C G @ K D C G @ K u = ∑ i = 1 K r e l ( i ) log ⁡ 2 ( i + 1 ) I D C G @ K = ∑ i = 1 K 1 log ⁡ 2 ( i + 1 ) \mathrm{NDCG}@K=\frac{1}{U}\sum_{u=1}^{U}\frac{\mathrm{DCG}@K_{u}}{\mathrm{IDCG}@K}\\ \mathrm{DCG}@K_{u}=\sum_{i=1}^{K}\frac{\mathrm{rel}(i)}{\log_{2}(i+1)}\\ \mathrm{IDCG}@K=\sum_{i=1}^{K}\frac{1}{\log_{2}(i+1)} NDCG@K=U1u=1UIDCG@KDCG@KuDCG@Ku=i=1Klog2(i+1)rel(i)IDCG@K=i=1Klog2(i+1)1

r e l ( i ) \mathrm{rel}(i) rel(i): 当用户 u u u 的推荐列表中的第 i i i 个元素命中时 r e l ( i ) = 1 rel(i)=1 rel(i)=1, 否则 r e l ( i ) = 0 rel(i)=0 rel(i)=0​.

4 其他指标

4.1 多样性
4.2 新颖性(Novelty)

新颖性评估推荐物品对用户的独特程度, 它衡量推荐物品与流行物品的不同程度.

新颖性可以分为基于流行度的物品新颖性和基于距离的物品新颖性.

其中, 基于流行度的物品新颖性(Popularity-based Item Novelty)可以表示为:
N o v e l t y @ K = 1 U ∑ u = 1 U ∑ i = 1 R ( u ) − log ⁡ 2 p ( i ) K p ( i ) = ∣ { u ∈ U , r u , i ≠ ∅ } ∣ U \mathrm{Novelty}@K=\frac{1}{U}\sum_{u=1}^{U}\frac{\sum_{i=1}^{\mathcal{R}(u)}-\log_{2}\mathrm{p}(i)}{K}\\ \mathrm{p}(i)=\frac{\mid\{u\in U,r_{u,i}\neq\varnothing\}\mid}{U} Novelty@K=U1u=1UKi=1R(u)log2p(i)p(i)=U{uU,ru,i=}
r u , i r_{u,i} ru,i: 用户 u u u 对物品 i i i 的评分.

4.3 惊喜度
4.4 信任度
4.5 实时性
4.6 健壮性

实现

import torch
import numpy as np
from tqdm import tqdm


class Evaluator(object):
    def __init__(self, method, model, test_data, num_items, batch_size, top_k, device):
        self.method = method
        self.model = model
        self.num_users = test_data['user'].max()
        self.num_items = num_items
        self.batch_size = batch_size
        self.test_data = test_data
        self.top_k = top_k
        self.device = device
        idcg = 0
        for i in range(self.top_k):
            idcg += 1 / np.log2(i + 2)  # i start from 0, so need add 2 instead.
        self.idcg = idcg

    def evaluate(self):
        Recall, Precision, HR, RR, AP, NDCG = [], [], [], [], [], []
        test_users = self.test_data['user'].unique()
        num_user_batchs = len(test_users) // self.batch_size + 1
        all_items = np.array(range(1, self.num_items + 1))  # all items in dataset
        self.model.eval()
        for batch_id in tqdm(range(num_user_batchs)):
            user_batch = test_users[batch_id * self.batch_size: (batch_id + 1) * self.batch_size]  # get a batch of users
            user_ids = torch.from_numpy(user_batch).long().to(self.device)
            item_ids = torch.from_numpy(all_items).long().to(self.device)

            # get top-k predictions:
            prediction_batch = self.model.predict(user_ids, item_ids).detach().cpu()
            _, top_k_indices_sorted = torch.topk(prediction_batch, k=self.top_k, dim=1)
            top_k_indices_sorted = top_k_indices_sorted.numpy() + 1

            # get ground truth
            test_items = []
            for user in user_batch:
                test_items.append(self.test_data.loc[self.test_data['user'] == user, 'item'].values.reshape(-1))

            # metrics
            for t, r in zip(test_items, top_k_indices_sorted):
                # t: true list, ground truth
                # r: recommendation list, predictions
                Recall.append(self.get_Recall(t, r))
                Precision.append(self.get_Precision(t, r))
                HR.append(self.get_HR(t, r))
                RR.append(self.get_RR(t, r))
                AP.append(self.get_AP(t, r))
                NDCG.append(self.get_NDCG(t, r))
        # return: Recall, Precision, HR, MRR, MAP, NDCG
        return np.mean(Recall), np.mean(Precision), np.mean(HR), np.mean(RR), np.mean(AP), np.mean(NDCG)
    
    def get_Recall(self, t, r):
        return len(np.intersect1d(t, r)) / len(t)
    
    def get_Precision(self, t, r):
        return len(np.intersect1d(t, r)) / self.top_k
    
    def get_HR(self, t, r):
        return 0 if len(np.intersect1d(t, r)) == 0 else 1

    def get_RR(self, t, r):
        for index, item in enumerate(r):
            if item in t:
                return 1 / (index + 1)
        return 0

    def get_AP(self, t, r):
        hits, sum_precision = 0, 0
        for index, item in enumerate(r):
            if item in t:
                hits += 1
                sum_precision += hits / (index + 1)
        if hits > 0:
            return sum_precision / hits
        else:
            return 0

    def get_NDCG(self, t, r):
        dcg = 0
        for index, item in enumerate(r):
            if item in t:
                dcg += 1 / np.log2(index + 2)
        return dcg / self.idcg
        
    def get_Novelty(self, r):
        sum_log = 0
        for i in r:
            sum_log += -np.log2(max(self.popularity[i - 1], 1e-8))  # avoid log(0)
        return sum_log / self.top_k

参考资料

[1] 推荐系统有哪些常用的评价标准

[2] 推荐系统研究中常用的评价指标

[3] 【推荐算法】从零开始做推荐(二)——TopK推荐的评价指标,计算原理与样例

[4] 推荐系统常用评价指标及其 Python 实现

[5] 推荐系统中的常用评价指标:NDCG,Recall,AUC,GAUC

[6] 评价指标 - HR, MRR, NDCG

[7] 详解评价指标MAP和NDCG(从推荐系统的角度)

[8] 如何理解推荐系统中的MAP评估指标?

[9] 10 metrics to evaluate recommender and ranking systems

[10] Mean Average Precision (MAP) in ranking and recommendations

[11] Normalized Discounted Cumulative Gain (NDCG) explained

[12] Mean Reciprocal Rank (MRR) explained

[13] Vargas S, Castells P. Rank and relevance in novelty and diversity metrics for recommender systems. RecSys, 2011.

[14] Kaminskas M, Bridge D. Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. TiiS, 2016.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值