GINI, CUMULATIVE ACCURACY PROFILE, AUC_accuracy profile study:raw data-CSDN博客

本文链接：https://blog.csdn.net/sinat_23971513/article/details/107838723

本文介绍了如何计算累积准确率剖面（CAP）、精度比率以及基尼系数，并探讨了它们之间的关系。通过示例数据展示了计算过程，包括按预测概率降序排列数据，计算每个分位数的坏客户数量，以及如何计算AUC。此外，还阐述了AUC与基尼系数的关系，指出两者在不同坐标轴下等价。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Importance of these methods

These methods measure discriminatory power of a predictive model. Discriminatory power implies whether predictive model is able to distinguish between events (desired outcome) and non-events. In credit risk modeling, it evaluates whether the probability of default model is able to separate good and bad customers. These two metrics Cumulative Accuracy Profile and Gini Coefficient are more common in credit risk analytics as compared to other domains.

Table of Contents

Cumulative Accuracy Profile (CAP)

Cumulative Accuracy profile (CAP) of a credit rating model shows percentage of all borrowers (debtors) on the x-axis and the percentage of defaulters (bad customers) on the y-axis. In marketing analytics, it is called Gain Chart. It is also called Power Curve in some other domains.

Interpretation

By using CAP, you can compare the curve of your current model to the curve of 'ideal or perfect' model and can also compare it with the curve of random model. 'Perfect model' refers to the ideal state in which all the bad customers (desired outcome) can be captured directly. 'Random model' refers to the state in which the proportion of bad customers are distributed equally. 'Current Model' refers to your probability of default model (or any other model you are working on). We always try to build the model which leans toward (closer) to the curve of perfect model. We can read current model as '% of bad customers covered at a given decile level'. For example, 89% of bad customers captured by just selecting top 30% of debtors based on model.

Steps to create Cumulative Accuracy Profile curve

Sort estimated probability of default in descending order and split it in 10 parts (decile). It means riskiest borrowers with high PD should be at top decile and safest borrowers should appear at bottom decile. Splitting score in 10 parts is not a thumb rule. Instead you can use rating grade.
Calculate number of borrowers (observations) in each decile
Calculate number of bad customers in each decile
Calculate cumulative Number of bad customers in each decile
Calculate percentage of bad customers in each decile
Calculate cumulative percentage of bad customers in each decile

Till now, we have done calculation based on the PD model (Remember first step is based on the probabilities obtained from PD model).

Next step : What should be the number of bad customers in each decile based on perfect model?

In perfect model, First decile should capture all the bad customers as first decile refers to worst rating grade OR borrowers with highest likelihood to default. In our case, first decile cannot capture all the bad customers as number of borrowers fall in the first decile is less than the total number of bad customers.
Calculate cumulative number of bad customers in each decile based on perfect model
Calculate cumulative % of bad customers in each decile based on perfect model

Next step : Calculate the cumulative percentage of bad customers in each decile based on random model In random model, each decile should constitute 10%. When we calculate cumulative %, it will be 10% in decile 1, 20% in decile 2 and so on till 100% in decile 10.

Next step : Create a plot with Cumulative % of Bads based on Current, Random and Perfect Model. In x axis, it shows percentage of borrowers (observations) and y axis represents percentage of Bad Customers.

Accuracy Ratio

In the case of CAP (Cumulative Accuracy Profile), Accuracy ratio is the ratio of the area between your current predictive model and the diagonal line and the area between the perfect model and the diagonal line. In other words, it is the ratio of the performance improvement of the current model over the random model to the performance improvement of the perfect model over the random model.

How to calculate Accuracy Ratio

First step is to calculate area between current model and diagonal line. We can compute area below current model (including area below diagonal line) by using Trapezoidal Rule Numerical Integration method. The area of a trapezoid is

( x_i+1 – x_i ) * ( y_i + y_i+1 ) * 0.5

( x _i+1 – x _i ) is the width of subinterval and (y _i + y _i+1)*0.5 is the average height.

In this case, x refers to values of cumulative proportion of borrowers at different decile levels and y refers to cumulative proportion of bad customers at different decile levels. Value of x₀ and y₀ is 0.

Once above step is completed, next step is to subtract 0.5 from the area returned from the previous step. You must be wondering relevance of 0.5. It is the area below