【推荐系统】召回指标NDCG

sdbhewfoqi

已于 2022-07-01 00:50:45 修改

阅读量2.9k

点赞数 4

分类专栏：推荐系统文章标签：推荐算法

于 2022-06-29 14:47:43 首次发布

本文链接：https://blog.csdn.net/weixin_31866177/article/details/125521922

版权

推荐系统专栏收录该内容

73 篇文章 21 订阅

订阅专栏

本文详细解析了nDCG（Normalized Discounted Cumulative Gain）在推荐系统中的应用，强调了使用真实评分而非模型预测，以及如何通过DCG和iDCG来评估推荐效果。通过实例说明了计算步骤和公式，帮助理解其在评估推荐准确性和排序质量中的关键作用。

摘要由CSDN通过智能技术生成

本篇文章完全来自于推荐系统评价指标nDCG到底如何实现 - 真中合欢的文章 - 知乎，怕丢失，copy一份。

重点：

实际计算nDCG，使用的是数据集中标注的真实分数（0 or 1），而不是，模型预测的分数（0.0872）

=> 真实分数+模型排序的组合计算DCG。

分子：求DCG使用的是topk的顺序。
分母：求iDCG使用的是topk的顺序。

计算过程：

1、累计增益CG：topk结果的真实分数求和（这个分数就是0-1。1表示用户点击过，0表示未点击过。）
2、折损累计增益DCG：
DCG是每个推荐项目的分数（数据集中的真实分数0or1），除以它所在的位置（model预测的topk顺序）。
主要目标：一个item的预测顺序越靠后，它折损的越严重。
3、归一化折损累计增益nDCG=折损累计增益DCG@k/最大累计增益iDCG@k。

总结：

nDCG整体的计算过程就是：
模型根据用户和候选物品的相似度，对候选item进行排序，返回k个最相似的item作为推荐结果。
保持模型的推荐顺序，给每个item标注它在原始数据集中的分数，计算DCG。
然后再对标注的分数（真实结果）从大到小重排，再计算一次DCG，这一次计算出的DCG就是iDCG，
让二者相除得到的就是nDCG。

举个例子：

对于用户A来说，他真实点击过的truth为

test_set = [
    [0, 21, 31, 41, 49]
]

模型预测为

rec_items = np.array([
    [0,  9,  5,  6, 7, 50, 8, 31, 21, 1]
])

，所以model预测出的top10中，真实命中了3个item（0和21和31）。

根据model预测topk计算DCG，重要的！看的是topk中的顺序，而不是truth中的顺序。所以0的pos=1，31的pos=8，21的pos=9。

dcg=1/np.log(2)+1/np.log(9)+1/np.log(10)

iDCG是求最大，就是理想情况，理想情况1都是排在前面的，理想topk排序 [0,21, 31,9, 5, 6, 7, 50, 8, 1]，所以

idcg=1/np.log(2)+1/np.log(3)+1/np.log(4)

相除，

ndcg=(1/np.log(2)+1/np.log(9)+1/np.log(10))/(1/np.log(2)+1/np.log(3)+1/np.log(4))

代码

import numpy as np
np.random.seed(2021)

class Model:
    def __init__(self, k):
        self.k = k
        self.item_size = 50

    def __call__(self, users):
        # 模型随机返回 k 个 item,模拟推荐结果
        res = np.random.randint(0, self.item_size, users.shape[0] * self.k)
        return res.reshape((users.shape[0], -1))


def get_implict_matrix(rec_items, test_set):
    rel_matrix = [[0] * rec_items.shape[1] for _ in range(rec_items.shape[0])]
    for user in range(len(test_set)):
        for index, item in enumerate(rec_items[user]):
            if item in test_set[user]:
                rel_matrix[user][index] = 1
    return np.array(rel_matrix)


def DCG(items):
    return np.sum(items / np.log(np.arange(2, len(items) + 2)))


def nDCG(rec_items, test_set):
    assert rec_items.shape[0] == len(test_set)
    # 获得隐式反馈的rel分数矩阵
    rel_matrix = get_implict_matrix(rec_items, test_set)
    ndcgs = []
    for user in range(len(test_set)):
        rels = rel_matrix[user]
        dcg = DCG(rels)
        # print('rels', rels)
        # print(sorted(rels, reverse=True)) # [1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
        idcg = DCG(sorted(rels, reverse=True))
        print('dcg&idcg',dcg,idcg)
        ndcg = dcg / idcg if idcg != 0 else 0
        ndcgs.append(ndcg)
    return ndcgs


# 假设 top-20 推荐,一共 5 个 user, 50 个 item ,隐式反馈数据集.
users = np.array([0])
# test_set 表示 5 个用户在测试集中分表交互过那些 item
test_set = [
    [0, 21, 31, 41, 49]
]
rec_items=np.array([
    [0,  9,  5,  6, 7, 50, 8, 31, 21, 1]
])
# model = Model(20)
# rec_items = model(users)
print("truth click", test_set)
print("rec_items", rec_items)
ndcgs = nDCG(rec_items, test_set)
print(ndcgs)

print('-'*10)

dcg=1/np.log(2)+1/np.log(9)+1/np.log(10)
idcg=1/np.log(2)+1/np.log(3)+1/np.log(4)
ndcg=(1/np.log(2)+1/np.log(9)+1/np.log(10))/(1/np.log(2)+1/np.log(3)+1/np.log(4))
print(dcg,idcg,ndcg)