文章目录
R 2 {R}^{2} R2是什么?
R
2
{R}^{2}
R2(R Squared),也就是可决系数(coefficient of determination),在0到1之间,常作为一个模型评价的指标,衡量的是模型中自变量对因变量方差的解释程度。
英文定义在这:R-Squared is a coefficient of determination that ranges from 0 to 1 (or 0% to 100%) and tells us how much of the variance in the dependent variable is explained by the independent variables in the model. 是不是看着有点迷糊?下面看看公式就会明白啦。
计算公式
先来看一下两个定义——RSS 和 TSS。
1.残差平方和(RSS, Sum of Squared Residuals)
残差平方和常在不同地方有不同的表示方式,RSS,SSR,SSres等,我下面都将用RSS表示。【其实记住“Sum of Squared Residuals”,应该就能辨认出它了。】
我们用
y
i
{y_{i}}
yi 表示真实值,
y
i
^
\hat{y_{i}}
yi^表示模型的预测值,RSS衡量的就是真实值和预测值之间的差异。
S
S
R
=
∑
i
=
1
n
(
y
i
^
−
y
i
)
2
SSR=\sum_{i=1}^{n}{(\hat{y_{i}}-{y_{i}})^{2}}
SSR=i=1∑n(yi^−yi)2
1.离差平方和(TSS, Total Sum of Squared)
用
y
i
{y_{i}}
yi 表示真实值,
y
i
^
\hat{y_{i}}
yi^表示模型的预测值,
y
i
ˉ
\bar{y_{i}}
yiˉ表示真实值的均值。均值代表的就是没有模型的情况下最原始的估计结果,就是取平均(It is a baseline prediction)。TSS衡量的是真实值和其均值之间的差异,也就是如果没有model,真实值会和预测值差多少。
T
S
S
=
∑
i
=
1
n
(
y
i
−
y
ˉ
)
2
TSS=\sum_{i=1}^{n}{({y_{i}}-\bar{y})^{2}}
TSS=i=1∑n(yi−yˉ)2
3. R 2 {R}^{2} R2
R
2
=
1
−
RSS
TSS
\mathbb{R}^2=1-\frac{\text{RSS}}{\text{TSS}}
R2=1−TSSRSS。RSS 表示模型预测值和真实值的差异,越小表示模型预测值和真实值越贴近,所以
R
2
{R}^{2}
R2越大,表示模型的预测值越接近真实值。
也可以这样理解:
R
2
=
1
−
Unexplained Variation
Total Variation
\mathbb{R}^2=1-\frac{\text{Unexplained Variation}}{\text{Total Variation}}
R2=1−Total VariationUnexplained Variation
4. Summery
1.RSS(Sum of Squared Residuals) = Total squared differences between actual and predicted values.
S
S
R
=
∑
i
=
1
n
(
y
i
^
−
y
i
)
2
SSR=\sum_{i=1}^{n}{(\hat{y_{i}}-{y_{i}})^{2}}
SSR=i=1∑n(yi^−yi)2
2.TSS(Total Sum of Squares) = Total squared differences between actual values and the mean.
T
S
S
=
∑
i
=
1
n
(
y
i
−
y
ˉ
)
2
TSS=\sum_{i=1}^{n}{({y_{i}}-\bar{y})^{2}}
TSS=i=1∑n(yi−yˉ)2
3.R-Squared is a coefficient of determination that ranges from 0 to 1 (or 0% to 100%) and tells us how much of the variance in the dependent variable is explained by the independent variables in the model.
R
2
=
1
−
RSS
TSS
\mathbb{R}^2=1-\frac{\text{RSS}}{\text{TSS}}
R2=1−TSSRSS
问题 —— R 2 {R}^{2} R2越大表示模型越好吗?
Nope。
1) R-Squared is a good indicator of overall model fit but should not be used alone.
2) A high R² does not mean the model is good; it could be overfitting.
3)It can only be used in linear regression. For multiple regression, use Adjusted R² to account for the number of predictors.
R
a
d
j
2
=
1
−
(
(
1
−
R
2
)
(
n
−
1
)
n
−
p
−
1
)
R_{adj}^2=1-\left(\frac{(1-R^2)(n-1)}{n-p-1}\right)
Radj2=1−(n−p−1(1−R2)(n−1))
n = number of observations (sample size)
p = number of predictors (independent variables)
代码
用深度学习 sklearn 库里的 metrics 模块,模型评价指标一般在 metrics 里面,如R^2,交叉熵(下一篇写它)等。
from sklearn.metrics import r2_score
y_actual = [3, -0.5, 2, 7]
y_predicted = [2.5, 0.0, 2, 8]
r2 = r2_score(y_actual, y_predicted)
print(f"R-Squared: {r2:.2f}")