可决系数(R^2)是什么?

R 2 {R}^{2} R2是什么?

   R 2 {R}^{2} R2(R Squared),也就是可决系数(coefficient of determination),在0到1之间,常作为一个模型评价的指标,衡量的是模型中自变量对因变量方差的解释程度。
   英文定义在这:R-Squared is a coefficient of determination that ranges from 0 to 1 (or 0% to 100%) and tells us how much of the variance in the dependent variable is explained by the independent variables in the model. 是不是看着有点迷糊?下面看看公式就会明白啦。

计算公式

  先来看一下两个定义——RSS 和 TSS。

1.残差平方和(RSS, Sum of Squared Residuals)

  残差平方和常在不同地方有不同的表示方式,RSS,SSR,SSres等,我下面都将用RSS表示。【其实记住“Sum of Squared Residuals”,应该就能辨认出它了。】
  我们用 y i {y_{i}} yi 表示真实值, y i ^ \hat{y_{i}} yi^表示模型的预测值,RSS衡量的就是真实值和预测值之间的差异
S S R = ∑ i = 1 n ( y i ^ − y i ) 2 SSR=\sum_{i=1}^{n}{(\hat{y_{i}}-{y_{i}})^{2}} SSR=i=1n(yi^yi)2

1.离差平方和(TSS, Total Sum of Squared)

  用 y i {y_{i}} yi 表示真实值, y i ^ \hat{y_{i}} yi^表示模型的预测值, y i ˉ \bar{y_{i}} yiˉ表示真实值的均值。均值代表的就是没有模型的情况下最原始的估计结果,就是取平均(It is a baseline prediction)。TSS衡量的是真实值和其均值之间的差异,也就是如果没有model,真实值会和预测值差多少
T S S = ∑ i = 1 n ( y i − y ˉ ) 2 TSS=\sum_{i=1}^{n}{({y_{i}}-\bar{y})^{2}} TSS=i=1n(yiyˉ)2

3. R 2 {R}^{2} R2

   R 2 = 1 − RSS TSS \mathbb{R}^2=1-\frac{\text{RSS}}{\text{TSS}} R2=1TSSRSS。RSS 表示模型预测值和真实值的差异,越小表示模型预测值和真实值越贴近,所以 R 2 {R}^{2} R2越大,表示模型的预测值越接近真实值。
  也可以这样理解:
R 2 = 1 − Unexplained Variation Total Variation \mathbb{R}^2=1-\frac{\text{Unexplained Variation}}{\text{Total Variation}} R2=1Total VariationUnexplained Variation

4. Summery

  1.RSS(Sum of Squared Residuals) = Total squared differences between actual and predicted values.
S S R = ∑ i = 1 n ( y i ^ − y i ) 2 SSR=\sum_{i=1}^{n}{(\hat{y_{i}}-{y_{i}})^{2}} SSR=i=1n(yi^yi)2
  2.TSS(Total Sum of Squares) = Total squared differences between actual values and the mean.
T S S = ∑ i = 1 n ( y i − y ˉ ) 2 TSS=\sum_{i=1}^{n}{({y_{i}}-\bar{y})^{2}} TSS=i=1n(yiyˉ)2
  3.R-Squared is a coefficient of determination that ranges from 0 to 1 (or 0% to 100%) and tells us how much of the variance in the dependent variable is explained by the independent variables in the model.
R 2 = 1 − RSS TSS \mathbb{R}^2=1-\frac{\text{RSS}}{\text{TSS}} R2=1TSSRSS

问题 —— R 2 {R}^{2} R2越大表示模型越好吗?

  Nope。
  1) R-Squared is a good indicator of overall model fit but should not be used alone.
  2) A high R² does not mean the model is good; it could be overfitting.
  3)It can only be used in linear regression. For multiple regression, use Adjusted R² to account for the number of predictors.
R a d j 2 = 1 − ( ( 1 − R 2 ) ( n − 1 ) n − p − 1 ) R_{adj}^2=1-\left(\frac{(1-R^2)(n-1)}{n-p-1}\right) Radj2=1(np1(1R2)(n1))

   n = number of observations (sample size)
   p = number of predictors (independent variables)

代码

   用深度学习 sklearn 库里的 metrics 模块,模型评价指标一般在 metrics 里面,如R^2,交叉熵(下一篇写它)等。

from sklearn.metrics import r2_score

y_actual = [3, -0.5, 2, 7]
y_predicted = [2.5, 0.0, 2, 8]

r2 = r2_score(y_actual, y_predicted)
print(f"R-Squared: {r2:.2f}")

参考文章:
8 Tips for Interpreting R-Squared
相关系数和可决系数R^2的那些事

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值