一元线性回归公式推导及证明
回归方程的架构
对于二维数据 ( x i , y i ) (x_{i},y_{i}) (xi,yi)进行建模,通过回归方程 y i = β 0 + β 1 x i + u y_{i}=\beta_{0}+\beta_{1}x_{i}+u yi=β0+β1xi+u描述其数据关系。求该方程需要获取未知参数 β 0 ^ \hat{\beta_0} β0^和 β 1 ^ \hat{\beta_1} β1^
有以下两种方式可以推导参数的解法
证明1.1矩估计求解 β 0 ^ \hat{\beta_0} β0^和 β 1 ^ \hat{\beta_1} β1^
矩估计方法依赖于零条件均值假设: E ( u ∣ x ) = 0 E(u|x)=0 E(u∣x)=0该假设的意思是给定 x x x,通过回归方程所得的 y ^ \hat{y} y^与实际 y y y的误差,平均值为0。也就是说因果关系上, y y y只受到 β 0 \beta_0 β0和 β 1 \beta_1 β1的影响。
根据零条件均值可以推出两个公式。 E ( u ) = E ( E ( u ∣ x ) ) = 0 c o v ( x , u ) = E ( x u ) − E ( x ) E ( u ) = E ( x u ) = E ( E ( x u ∣ x ) ) = E ( x E ( u ∣ x ) ) = 0 \begin{equation*} \begin{aligned} E(u)&=E(E(u|x))=0\\ \end{aligned} \end{equation*}\\ \begin{equation*} \begin{aligned} cov(x,u)&=E(xu)-E(x)E(u)\\ &=E(xu)\\ &=E(E(xu|x))\\ &=E(xE(u|x))\\ &=0 \end{aligned}\end{equation*} E(u)=E(E(u∣x))=0cov(x,u)=E(xu)−E(x)E(u)=E(xu)=E(E(xu∣x))=E(xE(u∣x))=0 这是求解未知参数的关键。
另外已知
y
i
=
β
0
+
β
1
x
+
u
=
=
=
=
=
y
ˉ
=
β
0
+
β
1
x
ˉ
+
u
ˉ
=
=
=
=
=
;
\begin{equation*} \begin{split} y_i&=\beta_0+\beta_1x+u\phantom{\;=====\;}\tag{1}\\ \end{split} \end{equation*} \\\begin{equation*} \begin{split} \bar{y}&=\beta_0+\beta_1\bar{x}+\bar{u}\phantom{\;=====;\;}\tag{2} \end{split} \end{equation*}
yi=β0+β1x+u=====(1)yˉ=β0+β1xˉ+uˉ=====;(2)
由
(
1
)
−
(
2
)
(1)-(2)
(1)−(2)得
y
i
−
y
ˉ
=
β
1
(
x
−
x
ˉ
)
+
(
u
−
u
ˉ
)
(
x
−
x
ˉ
)
(
y
i
−
y
ˉ
)
=
β
1
(
x
−
x
ˉ
)
2
+
(
u
−
u
ˉ
)
(
x
−
x
ˉ
)
\begin{equation*} \begin{split} y_i-\bar{y}=\beta_1(x-\bar{x})+(u-\bar{u}) \end{split} \end{equation*}\\ \begin{equation*} \begin{split} (x-\bar{x})(y_i-\bar{y})=\beta_1(x-\bar{x})^2+(u-\bar{u})(x-\bar{x}) \end{split} \end{equation*}
yi−yˉ=β1(x−xˉ)+(u−uˉ)(x−xˉ)(yi−yˉ)=β1(x−xˉ)2+(u−uˉ)(x−xˉ)
遍历所有的
i
=
1
,
2
,
3....
i=1,2,3....
i=1,2,3....,并求和
1
N
∑
i
=
1
N
(
x
−
x
ˉ
)
(
y
i
−
y
ˉ
)
=
1
N
∑
i
=
1
N
β
1
(
x
−
x
ˉ
)
2
+
1
N
∑
i
=
1
N
(
u
−
u
ˉ
)
(
x
−
x
ˉ
)
\begin{equation*} \begin{split} \frac{1}{N}\sum_{i=1}^N(x-\bar{x})(y_i-\bar{y})=\frac{1}{N}\sum_{i=1}^N\beta_1(x-\bar{x})^2+\frac{1}{N}\sum_{i=1}^N(u-\bar{u})(x-\bar{x}) \end{split} \end{equation*}
N1i=1∑N(x−xˉ)(yi−yˉ)=N1i=1∑Nβ1(x−xˉ)2+N1i=1∑N(u−uˉ)(x−xˉ)由于
c
o
v
(
x
,
u
)
=
0
cov(x,u)=0
cov(x,u)=0,因此
1
N
∑
i
=
1
N
(
u
−
u
ˉ
)
(
x
−
x
ˉ
)
=
0
\frac{1}{N}\sum_{i=1}^N(u-\bar{u})(x-\bar{x})=0
N1∑i=1N(u−uˉ)(x−xˉ)=0。从而得到
∑
i
=
1
N
(
x
−
x
ˉ
)
(
y
i
−
y
ˉ
)
=
∑
i
=
1
N
β
1
(
x
−
x
ˉ
)
2
β
1
^
=
∑
i
=
1
N
(
x
−
x
ˉ
)
(
y
i
−
y
ˉ
)
∑
i
=
1
N
(
x
−
x
ˉ
)
2
=
c
o
v
(
x
,
y
)
v
a
r
(
x
)
\begin{equation*} \begin{split} \sum_{i=1}^N(x-\bar{x})(y_i-\bar{y})&=\sum_{i=1}^N\beta_1(x-\bar{x})^2 \\ \hat{\beta_1}&=\frac{\sum_{i=1}^N(x-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^N(x-\bar{x})^2} \\&=\frac{cov(x,y)}{var(x)} \end{split} \end{equation*}
i=1∑N(x−xˉ)(yi−yˉ)β1^=i=1∑Nβ1(x−xˉ)2=∑i=1N(x−xˉ)2∑i=1N(x−xˉ)(yi−yˉ)=var(x)cov(x,y)再由式子
(
1
)
(1)
(1)得
β
0
^
=
y
ˉ
−
β
1
^
x
ˉ
\hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x}
β0^=yˉ−β1^xˉ
证明1.2 普通最小二乘法(OLS)求解 β 0 ^ \hat{\beta_0} β0^和 β 1 ^ \hat{\beta_1} β1^
简单的说,我们的任务就是找到使均方误差最小的 β 0 \beta_0 β0和 β 1 \beta_1 β1。数学表达式如下: m i n β 1 ^ , β 0 ^ 1 N ∑ i = 1 N ( u ^ i − u ˉ ) 2 = m i n β 1 ^ , β 0 ^ 1 N ∑ i = 1 N ( u ^ i ) 2 = m i n β 1 ^ , β 0 ^ 1 N ∑ i = 1 N ( y i − β 0 ^ − β 1 ^ x i ) 2 \begin{equation*} \begin{split} \mathop{min}\limits_{\hat{\beta_1},\hat{\beta_0}}\frac{1}{N}\sum_{i=1}^N(\hat{u}_i-\bar{u})^2 &=\mathop{min}\limits_{\hat{\beta_1},\hat{\beta_0}}\frac{1}{N}\sum_{i=1}^N(\hat{u}_i)^2 \\&=\mathop{min}\limits_{\hat{\beta_1},\hat{\beta_0}}\frac{1}{N}\sum_{i=1}^N(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)^2 \tag{3} \end{split} \end{equation*} β1^,β0^minN1i=1∑N(u^i−uˉ)2=β1^,β0^minN1i=1∑N(u^i)2=β1^,β0^minN1i=1∑N(yi−β0^−β1^xi)2(3)对 ( 3 ) (3) (3)尾式分别对 β 1 、 β 2 \beta_1、\beta_2 β1、β2微分得到 − 2 × 1 N ∑ i = 1 N ( y i − β 0 ^ − β 1 ^ x i ) = 0 − 2 × 1 N ∑ i = 1 N ( x i ( y i − β 0 ^ − β 1 ^ x i ) ) = 0 \begin{equation*} \begin{split} -2\times\frac{1}{N}\sum_{i=1}^N(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)=0\\ -2\times\frac{1}{N}\sum_{i=1}^N(x_i(y_i-\hat{\beta_0}-\hat{\beta_1}x_i))=0 \end{split} \end{equation*} −2×N1i=1∑N(yi−β0^−β1^xi)=0−2×N1i=1∑N(xi(yi−β0^−β1^xi))=0这等价于样本矩条件,连列两个方程,可以求得与1.1证明相同的结果。
拟合优度
分别定义一下指标以评价方程对数据的拟合情况。
名称 | 缩写 | 公式 |
---|---|---|
总平方和 | SST | ∑ i = 1 N ( y i − y ˉ ) 2 \sum_{i=1}^N(y_i-\bar{y})^2 ∑i=1N(yi−yˉ)2 |
解释平方和 | SSE | ∑ i = 1 N ( y i ^ − y ˉ ) 2 \sum_{i=1}^N(\hat{y_i}-\bar{y})^2 ∑i=1N(yi^−yˉ)2 |
残差平方和 | SSR | ∑ i = 1 N ( u ^ ) 2 \sum_{i=1}^N(\hat{u})^2 ∑i=1N(u^)2 |
他们的关系为SST=SSE+SSR,以下将给出证明。
证明2.1 SST=SSE+SSR
S
S
T
=
∑
i
=
1
N
(
y
i
−
y
ˉ
)
2
=
∑
i
=
1
N
(
y
i
−
y
i
^
+
y
i
^
−
y
ˉ
)
2
=
∑
i
=
1
N
(
u
i
^
+
y
i
^
−
y
ˉ
)
2
=
∑
i
=
1
N
(
u
i
^
)
2
+
∑
i
=
1
N
(
y
i
^
−
y
ˉ
)
2
+
2
∑
i
=
1
N
u
i
^
(
y
i
^
−
y
i
ˉ
)
=
S
S
R
+
S
S
E
+
2
∑
i
=
1
N
u
i
^
(
y
i
^
−
y
i
ˉ
)
\begin{equation*} \begin{aligned} SST&=\sum_{i=1}^N(y_i-\bar{y})^2 \\&=\sum_{i=1}^N(y_i-\hat{y_i}+\hat{y_i}-\bar{y})^2 \\&=\sum_{i=1}^N(\hat{u_i}+\hat{y_i}-\bar{y})^2 \\&=\sum_{i=1}^N(\hat{u_i})^2+\sum_{i=1}^N(\hat{y_i}-\bar{y})^2+2\sum_{i=1}^N\hat{u_i}(\hat{y_i}-\bar{y_i}) \tag{4} \\&=SSR+SSE+2\sum_{i=1}^N\hat{u_i}(\hat{y_i}-\bar{y_i}) \end{aligned} \end{equation*}
SST=i=1∑N(yi−yˉ)2=i=1∑N(yi−yi^+yi^−yˉ)2=i=1∑N(ui^+yi^−yˉ)2=i=1∑N(ui^)2+i=1∑N(yi^−yˉ)2+2i=1∑Nui^(yi^−yiˉ)=SSR+SSE+2i=1∑Nui^(yi^−yiˉ)(4)另外结合零条件均值假设,考察
∑
i
=
1
N
u
i
^
(
y
i
^
−
y
i
ˉ
)
\sum_{i=1}^N\hat{u_i}(\hat{y_i}-\bar{y_i})
∑i=1Nui^(yi^−yiˉ)
∑
i
=
1
N
u
i
^
(
y
i
^
−
y
i
ˉ
)
=
∑
i
=
1
N
u
i
^
y
i
^
−
y
ˉ
∑
i
=
1
N
u
i
^
=
∑
i
=
1
N
u
i
^
(
β
0
^
−
β
1
^
x
i
)
−
0
=
β
0
^
∑
i
=
1
N
u
i
^
−
β
1
^
∑
i
=
1
N
u
i
^
x
i
=
0
s
o
t
h
a
t
w
i
t
h
(
4
)
,
S
S
T
=
S
S
R
+
S
S
E
\begin{equation*} \begin{aligned} \sum_{i=1}^N\hat{u_i}(\hat{y_i}-\bar{y_i})&=\sum_{i=1}^N\hat{u_i}\hat{y_i}-\bar{y}\sum_{i=1}^N\hat{u_i} \\&=\sum_{i=1}^N\hat{u_i}(\hat{\beta_0}-\hat{\beta_1}x_i)-0 \\&=\hat{\beta_0}\sum_{i=1}^N\hat{u_i}- \hat{\beta_1}\sum_{i=1}^N\hat{u_i}x_i \\&=0 \\ \\so \ that \ with (4)&,SST=SSR+SSE \end{aligned} \end{equation*}
i=1∑Nui^(yi^−yiˉ)so that with(4)=i=1∑Nui^yi^−yˉi=1∑Nui^=i=1∑Nui^(β0^−β1^xi)−0=β0^i=1∑Nui^−β1^i=1∑Nui^xi=0,SST=SSR+SSE
我们定义拟合优度为
R
2
=
S
S
E
S
S
T
=
1
−
S
S
R
S
S
T
R^2=\frac{SSE}{SST}=1-\frac{SSR}{SST}
R2=SSTSSE=1−SSTSSR,同时拟合优度亦可通过相关系数进行计算
R
2
=
c
o
r
r
(
y
,
y
^
)
=
c
o
r
r
(
x
,
y
)
R^2=corr(y,\hat{y})=corr(x,y)
R2=corr(y,y^)=corr(x,y)。
证明2.2 R 2 = c o r r 2 ( y , y ^ ) = c o r r 2 ( x , y ) \ R^2=corr^2(y,\hat{y})=corr^2(x,y) R2=corr2(y,y^)=corr2(x,y)
显而易见 c o r r ( y , y ^ ) = c o r r ( y , β 1 ^ x + β 0 ^ ) = c o r r ( x , y ) corr(y,\hat{y})=corr(y,\hat{\beta_1}x+\hat{\beta_0})=corr(x,y) corr(y,y^)=corr(y,β1^x+β0^)=corr(x,y)(对某一数据线性变换不影响相关性) d e f i n e d b y R 2 = S S E S S T = ∑ i = 1 N ( y ^ i − y ˉ ) 2 ∑ i = 1 N ( y i − y ˉ ) 2 = ∑ i = 1 N ( β 1 ^ x + β 0 ^ − β 1 ^ x ˉ − β 0 ^ ) 2 ∑ i = 1 N ( y i − y ˉ ) 2 = ∑ i = 1 N ( β 1 ^ x − β 1 ^ x ˉ ) 2 ∑ i = 1 N ( y i − y ˉ ) 2 = v a r 2 ( β 1 ^ x ) v a r 2 ( y ) = β 1 ^ 2 v a r ( x ) v a r ( y ) = c o v 2 ( x , y ) v a r 2 ( x ) × v a r ( x ) v a r ( y ) = [ c o v ( x , y ) v a r ( x ) v a r ( y ) ] 2 = c o r r 2 ( x , y ) s o c e r t i f i e d t h a t R 2 = c o r r 2 ( y , y ^ ) = c o r r 2 ( x , y ) \begin{equation*} \begin{aligned} defined\ by\ R^2&=\frac{SSE}{SST}=\frac{\sum_{i=1}^{N}(\hat{y}_i-\bar{y})^2}{\sum_{i=1}^{N}(y_i-\bar{y})^2} \\&=\frac{\sum_{i=1}^{N}(\hat{\beta_1}x+\hat{\beta_0}-\hat{\beta_1}\bar{x}-\hat{\beta_0})^2}{\sum_{i=1}^{N}(y_i-\bar{y})^2} \\&=\frac{\sum_{i=1}^{N}(\hat{\beta_1}x-\hat{\beta_1}\bar{x})^2}{\sum_{i=1}^{N}(y_i-\bar{y})^2} \\&=\frac{var^2(\hat{\beta_1}x)}{var^2(y)} \\&=\hat{\beta_1}^2\frac{var(x)}{var(y)} \\&=\frac{cov^2(x,y)}{var^2(x)}\times\frac{var(x)}{var(y)} \\&=[\frac{cov(x,y)}{\sqrt{var(x)var(y)}}]^2 \\&=corr^2(x,y) \\ \ so\ certified \ that \ R^2&=corr^2(y,\hat{y})=corr^2(x,y) \end{aligned} \end{equation*} defined by R2 so certified that R2=SSTSSE=∑i=1N(yi−yˉ)2∑i=1N(y^i−yˉ)2=∑i=1N(yi−yˉ)2∑i=1N(β1^x+β0^−β1^xˉ−β0^)2=∑i=1N(yi−yˉ)2∑i=1N(β1^x−β1^xˉ)2=var2(y)var2(β1^x)=β1^2var(y)var(x)=var2(x)cov2(x,y)×var(y)var(x)=[var(x)var(y)cov(x,y)]2=corr2(x,y)=corr2(y,y^)=corr2(x,y)
参数的无偏性
我们所求参数 β 1 ^ \hat{\beta_1} β1^与 β 0 ^ \hat{\beta_0} β0^具有无偏性。以下将给出证明。
证明3.1 估计参数 β 1 ^ \hat{\beta_1} β1^无偏,即 E ( β ^ 1 ) = β 1 E(\hat{\beta}_1)=\beta_1 E(β^1)=β1
k
n
o
w
n
t
h
a
t
β
^
1
=
∑
i
=
1
N
(
x
−
x
ˉ
)
(
y
i
−
y
ˉ
)
∑
i
=
1
N
(
x
−
x
ˉ
)
2
a
n
d
y
i
−
y
ˉ
=
β
1
(
x
i
−
x
ˉ
)
+
(
u
i
−
u
ˉ
)
\begin{equation*} \begin{aligned} known \ \ that \ \ \ \hat{\beta}_1=\frac{\sum_{i=1}^N(x-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^N(x-\bar{x})^2} \tag{5} \ \ \ and \ \ y_i-\bar{y}=\beta_1(x_i-\bar{x})+(u_i-\bar{u}) \end{aligned} \end{equation*}
known that β^1=∑i=1N(x−xˉ)2∑i=1N(x−xˉ)(yi−yˉ) and yi−yˉ=β1(xi−xˉ)+(ui−uˉ)(5)
待更新,有问题请评论区说明