[思维][模拟]Scholomance Academy 第45届icpc区域赛沈阳站K

59 篇文章 0 订阅
7 篇文章 0 订阅

题目描述

As a student of the Scholomance Academy, you are studying a course called \textit{Machine Learning}. You are currently working on your course project: training a binary classifier.

A binary classifier is an algorithm that predicts the classes of instances, which may be positive (+)({+})(+) or negative (−)({-})(−). A typical binary classifier consists of a scoring function S{S}S that gives a score for every instance and a threshold θ\thetaθ that determines the category. Specifically, if the score of an instance S(x)≥θS(x) \geq \thetaS(x)≥θ, then the instance x{x}x is classified as positive; otherwise, it is classified as negative. Clearly, choosing different thresholds may yield different classifiers.
 

Of course, a binary classifier may have misclassification: it could either classify a positive instance as negative (false negative) or classify a negative instance as positive (false positive).

Given a dataset and a classifier, we may define the true positive rate (TPR{TPR}TPR) and the false positive rate (FPR{FPR}FPR) as follows:

TPR=#TP#TP+#FN,FPR=#FP#TN+#FP{TPR} = \frac{\# {TP}} {\# {TP} + \# {FN}}, \quad {FPR} = \frac{\# {FP}} {\# {TN} + \# {FP}}TPR=#TP+#FN#TP​,FPR=#TN+#FP#FP​

where #TP\# TP#TP is the number of true positives in the dataset; #FP,#TN,#FN\# FP, \#TN, \#FN#FP,#TN,#FN are defined likewise.

Now you have trained a scoring function, and you want to evaluate the performance of your classifier. The classifier may exhibit different TPR and FPR if we change the threshold θ\thetaθ. Let TPR(θ),FPR(θ){TPR}(\theta), FPR(\theta)TPR(θ),FPR(θ) be the TPR,FPR{TPR, FPR}TPR,FPR when the threshold is θ\thetaθ, define the area  under  curve{area\;under\;curve}areaundercurve (AUC{AUC}AUC) as
AUC=∫01max⁡θ∈R{TPR(θ)∣FPR(θ)≤r}dr{AUC} = \int_{0}^{1} \max_{\theta \in \mathbb{R}} \{TPR(\theta)|FPR(\theta) \leq r\} d rAUC=∫01​maxθ∈R​{TPR(θ)∣FPR(θ)≤r}dr
where the integrand, called receiver  operating  characteristic{receiver\;operating\;characteristic}receiveroperatingcharacteristic (ROC), means the maximum possible of TPR{TPR}TPR given that FPR≤rFPR \leq rFPR≤r.


Given the actual classes and predicted scores of the instances in a dataset, can you compute the AUC{AUC}AUC of your classifier?

For example, consider the third test data. If we set threshold θ=30\theta = 30θ=30, there are 3 true positives, 2 false positives, 2 true negatives, and 1 false negative; hence, TPR(30)=0.75{TPR}(30) = 0.75TPR(30)=0.75 and FPR(30)=0.5{FPR}(30) = 0.5FPR(30)=0.5. Also, as θ\thetaθ varies, we may plot the ROC curve and compute the AUC accordingly, as shown in Figure 1.

输入描述:

The first line contains a single integer n{n}n (2≤n≤106)(2 \leq n \leq 10^6)(2≤n≤106), the number of instances in the dataset. Then follow n{n}n lines, each line containing a character c∈{+,−}c \in \{{+},{-}\}c∈{+,−} and an integer s{s}s (1≤s≤109)(1 \leq s \leq 10^9)(1≤s≤109), denoting the actual class and the predicted score of an instance.

It is guaranteed that there is at least one instance of either class.

输出描述:

Print the AUC{AUC}AUC of your classifier within an absolute error of no more than 10−910^{-9}10−9.

示例1

输入

3
+ 2
- 3
- 1

输出

0.5

示例2

输入

6
+ 7
- 2
- 5
+ 4
- 2
+ 6

输出

0.888888888888889

示例3

输入

8
+ 34
+ 33
+ 26
- 34
- 38
+ 39
- 7
- 27

输出

0.5625

说明

 

ROC and AUC{AUC}AUC of the third sample data.

题意: 题目巨长无比,实在考验人的耐心......有一台分类器,可以根据设定的指标θ来把目标分类成+或者-,如果目标参数大于等于θ就分类成+,如果小于θ分类成-。给出n个目标的目标参数以及它们真正的类别,设FPR为真实类别为-的目标中被机器分类为+的目标个数 / 真实类别为-的目标个数,设TPR为真实类别为+的目标中被机器分类为+的目标个数 / 真实类别为+的目标个数,显然FPR与TPR是关于θ的函数。令θ取遍实数可以得到一系列的FPR(θ)、TPR(θ),即以FPR和TPR为轴的一系列散点,构造函数值f(FPR)为小于等于FPR的区域内TPR的最大值,求f函数在[0, 1]上的积分。

分析: 显然散点一定在θ取每个目标参数时可以全部获取到,因此只需要枚举目标参数就可以得到图上的所有散点。根据f函数的定义可以得知f是个分段函数且每段都是直线,同时f一定递增。因此求积分就是一个求矩形面积的过程,for循环枚举断点累加求和即可。

具体代码如下: 

#include<cstdio>
#include<cstring>
#include<algorithm>
#include<iostream>
#include<queue>
#include<map>
#define int long long
#define double long double
using namespace std;
const int N = 1e6+10;
typedef pair<int,int> PII;
map<int,int> mp;
int a[N];
int p[N],ne[N],cnt1,cnt2;
signed main()
{
	int n;
	cin >> n;
	char t[2];
	for(int i = 1; i <= n; i++)
	{
		scanf("%s%lld", t, &a[i]);
		if(t[0] == '+')
			p[cnt1++] = a[i];
		else 
			ne[cnt2++] = a[i];
	} 
	sort(p,p+cnt1);
	sort(ne,ne+cnt2);
	double pp = 0;
	if(cnt2 == 0){
		printf("%.9Lf\n",pp);
		return 0;
	}
	for(int i=0;i<cnt2;i++){
		int x = cnt2 - (lower_bound(ne,ne+cnt2,ne[i]) - ne);
		int t = cnt1 - (lower_bound(p,p+cnt1,ne[i]) - p);
		mp[x] = max(mp[x],t);
	}
	for(int i=0;i<cnt1;i++){
		int x = cnt2 - (lower_bound(ne,ne+cnt2,p[i]) - ne);
		int t = cnt1 - (lower_bound(p,p+cnt1,p[i]) - p);
		mp[x] = max(mp[x],t);
	}
	int xl = 0,y = mp[0],ans = 0;
	for(map<int,int>::iterator it = mp.begin();it != mp.end();it++){
		int xr = it->first;
		ans += (xr-xl)*y;
		y = it->second;
		xl = xr;
	}
	printf("%.9Lf\n",(double)ans/cnt1/cnt2);
	return 0;
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值