输出数值类型的算法评价指标

颜妮儿

已于 2022-11-07 09:31:16 修改

阅读量387

点赞数 1

分类专栏：机器学习文章标签：算法 python matlab

于 2022-10-17 15:22:41 首次发布

本文链接：https://blog.csdn.net/Z__XY_/article/details/127363573

版权

机器学习专栏收录该内容

14 篇文章 1 订阅

订阅专栏

原因参见老师的博客：基于序的评价指标 (特别针对推荐系统和多标签学习)

Peak-F $_1$

由于算法的输出数值类型的，所以阈值的确定影响着算法的效果。
由二分类问题中的分类结果混淆矩阵引申，得到 $F_1$ 在该算法中的最大值称为 $Peak-F_1$ .
Python实现：

    def compute_peak_f1(self):
        temp_predict_vector = self.predict_prob_matrix.reshape(-1)
        temp_predict_sort_index = np.argsort(-temp_predict_vector)
        temp_test_target_vector = self.test_target.reshape(-1)
        temp_test_target_sort = temp_test_target_vector[temp_predict_sort_index]
        temp_f1_list = []
        TP_FN = np.sum(self.test_target > 0)
        for i in range(temp_predict_sort_index.size):
            TP = np.sum(temp_test_target_sort[0:i + 1] == 1)
            P = TP / (i + 1)
            R = TP / TP_FN
            temp_f1 = 0
            if (P + R) != 0:
                temp_f1 = 2.0 * P * R / (P + R)
                pass
            temp_f1_list.append(temp_f1)
            pass

        temp_f1_list = np.array(temp_f1_list)
        temp_max_f1_index = np.argmax(temp_f1_list)
        peak_f1 = np.max(temp_f1_list)
        threshold_value = temp_predict_vector[temp_max_f1_index]
        self.threshold_value = threshold_value
        print("compute_peak_f1:", peak_f1)
        return peak_f1
        pass

直接从算法的代码中粘贴过来了。
predict_prob_matrix是模型预测的数值结果，test_target是测试集对应的真正值。
matlab代码：

function [score] = F1(label_prob, label_target)
[sortArray,temp] = sort(-label_prob);
allLabelSort = label_target(temp);
tempF1 = zeros(1, numel(temp));
allTP = sum(label_target == 1);

for i = 1: numel(temp)
    sliceArray = allLabelSort(1:i); 
    TP = sum(sliceArray == 1);
    P = TP / (i);
    R = TP / allTP;
    if(P + R == 0)
        tempF1(i) = 0;
    else
        tempF1(i) = (2.0 * P * R) / (P + R);
    end
end
score = max(tempF1);
end

auc：

代码一：

    def compute_auc(self):
        temp_predict_vector = self.predict_prob_matrix.reshape(-1)
        temp_test_target_vector = self.test_target.reshape(-1)
        temp_predict_sort_index = np.argsort(temp_predict_vector)

        M, N = 0, 0
        for i in range(temp_predict_vector.size):
            if temp_test_target_vector[i] == 1:
                M += 1
            else:
                N = N + 1
                pass
            pass

        sigma = 0
        for i in range(temp_predict_vector.size - 1, -1, -1):
            if temp_test_target_vector[temp_predict_sort_index[i]] == 1:
                sigma += i + 1
                pass
            pass
        auc = (sigma - (M + 1) * M / 2) / (M * N)
        print("compute_auc:", auc)
        return auc

代码二：

    def computeAUC(self):

        tempProbVector = self.predict_prob_matrix.reshape(-1)
        tempTargetVector = self.test_target.reshape(-1)

        auc = metrics.roc_auc_score(tempTargetVector, tempProbVector)
        print("computeAUC:", auc)
        return auc

matlab实现：

function [result] = AUC(output, test_targets)
[A,I]=sort(output);
M=0;N=0;
for i=1:length(output)
    if(test_targets(i)==1)
        M=M+1;
    else
        N=N+1;
    end
end
sigma=0;
for i=M+N:-1:1
    if(test_targets(I(i))==1)
        sigma=sigma+i;
    end
end
result=(sigma-(M+1)*M/2)/(M*N);
end

ndcg：

python实现
代码一：

    def compute_ndgc(self):
        temp_predict_vector = self.predict_prob_matrix.reshape(-1)
        temp_test_target_vector = self.test_target.reshape(-1)

        temp_predict_sort_index = np.argsort(-temp_predict_vector)
        temp_predict_target_sort = temp_test_target_vector[temp_predict_sort_index]

        temp_target_sort = np.sort(temp_test_target_vector)
        temp_target_sort = np.flipud(temp_target_sort)

        dcg = 0;
        for i in range(temp_predict_vector.size):
            rel = temp_predict_target_sort[i]
            denominator = math.log2(i + 2)
            dcg += rel / denominator

        idcg = 0
        for i in range(temp_predict_vector.size):
            rel = temp_target_sort[i]
            denominator = math.log2(i + 2)
            idcg += rel / denominator
        ndcg = dcg / idcg
        print("compute_ndgc: ", ndcg)
        return ndcg

代码二

    def computeNDCG(self):

        # 获得概率序列与原目标序列
        tempProbVector = self.predict_prob_matrix.reshape(-1)
        tempTargetVector = self.test_target.reshape(-1)

        # 按照概率序列排序原1/0串
        temp = np.argsort(-tempProbVector)
        allLabelSort = tempTargetVector[temp]

        # 获得最佳序列: 1111...10000...0
        sortedTargetVector = np.sort(tempTargetVector)[::-1]

        # compute DCG(使用预测的顺序, rel是真实顺序, 实际是111110111101110000001000100
        DCG = 0
        for i in range(temp.size):
            rel = allLabelSort[i]
            denominator = np.log2(i + 2)
            DCG += (rel / denominator)

        # compute iDCG(使用最佳顺序: 11111111110000000000)
        iDCG = 0
        for i in range(temp.size):
            rel = sortedTargetVector[i]
            denominator = np.log2(i + 2)
            iDCG += (rel / denominator)

        ndcg = DCG / iDCG
        print("computeNDCG: ", ndcg)
        return ndcg

matlab实现：

function [ndcg] = NDCG(label_prob, label_target)
[sortArray,temp] = sort(-label_prob);  % 按照预测值进行排序
allLabelSort = label_target(temp); % 根据排序后的预测值获取对应的标签值
sortedTargetVector = sort(label_target);  % 对标签值进行排序（这里是升序排列）
sortedTargetVector = fliplr(sortedTargetVector);%对排序后的标签值进行翻转，使之呈降序排列

dcg = 0;
for i = 1: numel(temp)
    rel = allLabelSort(i);
    denominator = log2(i + 1);
    dcg = dcg + (rel / denominator);
end

idcg = 0;    %最理想的DCG状态就是按照目标值的进行排列
for i = 1: numel(temp) 
    rel = sortedTargetVector(i);
    denominator = log2(i + 1);
    idcg = idcg + (rel / denominator);
end 

ndcg = max(dcg / idcg);
end

以后直接上这儿粘贴了，为偷懒打下基础。

颜妮儿

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
输出数值类型的算法评价指标

predict_prob_matrix是模型预测的数值结果，test_target是测试集对应的真正值。由于算法的输出数值类型的，所以阈值的确定影响着算法的效果。由二分类问题中的分类结果混淆矩阵引申，得到。以后直接上这儿粘贴了，为偷懒打下基础。直接从算法的代码中粘贴过来了。在该算法中的最大值称为。
复制链接

扫一扫