信息检索的top-R准确率曲线(Precision@top-R Curve)作图

版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/HackerTom/article/details/89576824

Notes

多模态检索中常用三种评价指标:

师兄的说法,只要将 P-R 曲线中的 R 从 Recall 改为 top-R 之 R(即第 R 个位置)就行,代码直接从 P-R 曲线作图代码修改而来,同师兄对拍过样例,是一样的。

Code

python

import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import normalize

# cosine 相似度
def cos_sim(f1, f2):
    """cosine similarity (np.ndarray)
    Args:
        f1/f2: feature matrix
    Return:
        sim: similarity matrix
    """
    f1 = normalize(f1, norm='l2', axis=1)
    f2 = normalize(f2, norm='l2', axis=1)
    sim = np.dot(f1, f2.T)

    return 0.5 + 0.5 * sim
    # return sim


# cosine 距离
def cos_dis(f1, f2):
    """cosine distance = 1. - cosine similarity"""
    return 1. - cos_sim(f1, f2)


def euclidean_dis(f1, f2):
    """ Euclidean distance """
    return np.linalg.norm(f1 - f2, axis=-1)


# Hamming 距离
def hamming_dis(B1, B2):
    """ Hamming distance """
    q = B2.shape[1]
    distH = 0.5 * (q - np.dot(B1, B2.transpose()))
    return distH


# 画 Precision@top-R 曲线
def p_at_topR(qF, rF, qL, rL, what=0, topK=-1):
    n_query = qF.shape[0]
    if topK == -1 or topK > rF.shape[0]:
        topK = rF.shape[0]
    P, R = [], []
    Gnd = (np.dot(qL, rL.transpose()) > 0).astype(np.float32)
    if what == 0:
        Rank = np.argsort(cos_dis(qF, rF))
    else:
        Rank = np.argsort(hamming_dis(qF, rF))

    for k in range(1, topK+1):
        # ground-truth: 1 vs all
        p = np.zeros(n_query)
        # r = np.zeros(n_query)
        for it in range(n_query):
            gnd = Gnd[it]
            gnd_all = np.sum(gnd)
            if gnd_all == 0:
                continue
            # the id of sorted dis
            # (but left dis as it is)
            asc_id = Rank[it][:k]

            gnd = gnd[asc_id]
            gnd_r = np.sum(gnd)

            p[it] = gnd_r / k
            # r[it] = gnd_r / gnd_all

        P.append(np.mean(p))
        # R.append(np.mean(r))
        R.append(k)

    fig = plt.figure(figsize=(5, 5))
    plt.plot(R, P)
    plt.grid(True)
    # plt.xlim(0, 1)
    # plt.ylim(0, 1)
    plt.xlabel('recall')
    plt.ylabel('precision')
    plt.legend()
    plt.show()
    # return R, P

matlab

  • 师兄给的这份代码好像是来自 CCQ 的,见引用[2]
function precision = precision_at_k(ids, Lbase, Lquery)

nquery = size(ids, 2);
K = 1000;
P = zeros(K, nquery);

for i = 1 : nquery
    label = Lquery(i, :);
    label(label == 0) = -1;
    idx = ids(:, i);
    imatch = sum(bsxfun(@eq, Lbase(idx(1:K), :), label), 2) > 0;
    Lk = cumsum(imatch);
    P(:, i) = Lk ./ (1:K)';
end
precision = mean(P, 2);

end

References

  1. Evaluation of Information Retrieval Systems
  2. Composite Correlation Quantization for Efficient Multimodal Retrieval
展开阅读全文

没有更多推荐了,返回首页