输出数值类型的算法评价指标

原因参见老师的博客:基于序的评价指标 (特别针对推荐系统和多标签学习)

Peak-F 1 _1 1

由于算法的输出数值类型的,所以阈值的确定影响着算法的效果。
由二分类问题中的分类结果混淆矩阵引申,得到 F 1 F_1 F1在该算法中的最大值称为 P e a k − F 1 Peak-F_1 PeakF1.
Python实现:

    def compute_peak_f1(self):
        temp_predict_vector = self.predict_prob_matrix.reshape(-1)
        temp_predict_sort_index = np.argsort(-temp_predict_vector)
        temp_test_target_vector = self.test_target.reshape(-1)
        temp_test_target_sort = temp_test_target_vector[temp_predict_sort_index]
        temp_f1_list = []
        TP_FN = np.sum(self.test_target > 0)
        for i in range(temp_predict_sort_index.size):
            TP = np.sum(temp_test_target_sort[0:i + 1] == 1)
            P = TP / (i + 1)
            R = TP / TP_FN
            temp_f1 = 0
            if (P + R) != 0:
                temp_f1 = 2.0 * P * R / (P + R)
                pass
            temp_f1_list.append(temp_f1)
            pass

        temp_f1_list = np.array(temp_f1_list)
        temp_max_f1_index = np.argmax(temp_f1_list)
        peak_f1 = np.max(temp_f1_list)
        threshold_value = temp_predict_vector[temp_max_f1_index]
        self.threshold_value = threshold_value
        print("compute_peak_f1:", peak_f1)
        return peak_f1
        pass

直接从算法的代码中粘贴过来了。
predict_prob_matrix是模型预测的数值结果,test_target是测试集对应的真正值。
matlab代码:

function [score] = F1(label_prob, label_target)
[sortArray,temp] = sort(-label_prob);
allLabelSort = label_target(temp);
tempF1 = zeros(1, numel(temp));
allTP = sum(label_target == 1);

for i = 1: numel(temp)
    sliceArray = allLabelSort(1:i); 
    TP = sum(sliceArray == 1);
    P = TP / (i);
    R = TP / allTP;
    if(P + R == 0)
        tempF1(i) = 0;
    else
        tempF1(i) = (2.0 * P * R) / (P + R);
    end
end
score = max(tempF1);
end

auc:

代码一:

    def compute_auc(self):
        temp_predict_vector = self.predict_prob_matrix.reshape(-1)
        temp_test_target_vector = self.test_target.reshape(-1)
        temp_predict_sort_index = np.argsort(temp_predict_vector)

        M, N = 0, 0
        for i in range(temp_predict_vector.size):
            if temp_test_target_vector[i] == 1:
                M += 1
            else:
                N = N + 1
                pass
            pass

        sigma = 0
        for i in range(temp_predict_vector.size - 1, -1, -1):
            if temp_test_target_vector[temp_predict_sort_index[i]] == 1:
                sigma += i + 1
                pass
            pass
        auc = (sigma - (M + 1) * M / 2) / (M * N)
        print("compute_auc:", auc)
        return auc

代码二:

    def computeAUC(self):

        tempProbVector = self.predict_prob_matrix.reshape(-1)
        tempTargetVector = self.test_target.reshape(-1)

        auc = metrics.roc_auc_score(tempTargetVector, tempProbVector)
        print("computeAUC:", auc)
        return auc

matlab实现:

function [result] = AUC(output, test_targets)
[A,I]=sort(output);
M=0;N=0;
for i=1:length(output)
    if(test_targets(i)==1)
        M=M+1;
    else
        N=N+1;
    end
end
sigma=0;
for i=M+N:-1:1
    if(test_targets(I(i))==1)
        sigma=sigma+i;
    end
end
result=(sigma-(M+1)*M/2)/(M*N);
end

ndcg:

python实现
代码一:

    def compute_ndgc(self):
        temp_predict_vector = self.predict_prob_matrix.reshape(-1)
        temp_test_target_vector = self.test_target.reshape(-1)

        temp_predict_sort_index = np.argsort(-temp_predict_vector)
        temp_predict_target_sort = temp_test_target_vector[temp_predict_sort_index]

        temp_target_sort = np.sort(temp_test_target_vector)
        temp_target_sort = np.flipud(temp_target_sort)

        dcg = 0;
        for i in range(temp_predict_vector.size):
            rel = temp_predict_target_sort[i]
            denominator = math.log2(i + 2)
            dcg += rel / denominator

        idcg = 0
        for i in range(temp_predict_vector.size):
            rel = temp_target_sort[i]
            denominator = math.log2(i + 2)
            idcg += rel / denominator
        ndcg = dcg / idcg
        print("compute_ndgc: ", ndcg)
        return ndcg

代码二

    def computeNDCG(self):

        # 获得概率序列与原目标序列
        tempProbVector = self.predict_prob_matrix.reshape(-1)
        tempTargetVector = self.test_target.reshape(-1)

        # 按照概率序列排序原1/0串
        temp = np.argsort(-tempProbVector)
        allLabelSort = tempTargetVector[temp]

        # 获得最佳序列: 1111...10000...0
        sortedTargetVector = np.sort(tempTargetVector)[::-1]

        # compute DCG(使用预测的顺序, rel是真实顺序, 实际是111110111101110000001000100
        DCG = 0
        for i in range(temp.size):
            rel = allLabelSort[i]
            denominator = np.log2(i + 2)
            DCG += (rel / denominator)

        # compute iDCG(使用最佳顺序: 11111111110000000000)
        iDCG = 0
        for i in range(temp.size):
            rel = sortedTargetVector[i]
            denominator = np.log2(i + 2)
            iDCG += (rel / denominator)

        ndcg = DCG / iDCG
        print("computeNDCG: ", ndcg)
        return ndcg

matlab实现:

function [ndcg] = NDCG(label_prob, label_target)
[sortArray,temp] = sort(-label_prob);  % 按照预测值进行排序
allLabelSort = label_target(temp); % 根据排序后的预测值获取对应的标签值
sortedTargetVector = sort(label_target);  % 对标签值进行排序(这里是升序排列)
sortedTargetVector = fliplr(sortedTargetVector);%对排序后的标签值进行翻转,使之呈降序排列

dcg = 0;
for i = 1: numel(temp)
    rel = allLabelSort(i);
    denominator = log2(i + 1);
    dcg = dcg + (rel / denominator);
end

idcg = 0;    %最理想的DCG状态就是按照目标值的进行排列
for i = 1: numel(temp) 
    rel = sortedTargetVector(i);
    denominator = log2(i + 1);
    idcg = idcg + (rel / denominator);
end 

ndcg = max(dcg / idcg);
end

以后直接上这儿粘贴了,为偷懒打下基础。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值