【概率论与数理统计(研究生课程)】知识点总结9(回归分析)

原文地址:【概率论与数理统计(研究生课程)】知识点总结9(回归分析)

一元线性回归模型

y = β 0 + β 1 x + ϵ , ϵ ∼ N ( μ , σ 2 ) E ( ϵ ) = 0 , D ( ϵ ) = σ 2 > 0 ⟹ E ( y ) = β 0 + β 1 x \begin{aligned} &y=\beta_0+\beta_1x+\epsilon,\quad \epsilon \sim N(\mu, \sigma^2) \\ &E(\epsilon)=0,D(\epsilon)=\sigma^2>0 \Longrightarrow E(y)=\beta_0+\beta_1x \end{aligned} y=β0+β1x+ϵ,ϵN(μ,σ2)E(ϵ)=0,D(ϵ)=σ2>0E(y)=β0+β1x

回归方程: y ^ = β 0 ^ + β 1 ^ x \hat{y}=\hat{\beta_0}+\hat{\beta_1}x y^=β0^+β1^x

推导过程:
y i − E ( y i ) = y i − ( β 0 + β 1 x i ) Q ( β 1 , β 2 ) = ∑ i = 1 n ( y i − E ( y i ) ) 2 = ∑ i = 1 n ( y i − β 0 − β 1 x i ) 2 make  ∂ Q ( β 0 , β 1 ) ∂ β 0 = − 2 ∑ i = 1 n ( y i − β 0 − β 1 x i ) = 0 make  ∂ Q ( β 0 , β 1 ) ∂ β 1 = − 2 ∑ i = 1 n x i ( y i − β 0 − β 1 x i ) = 0 \begin{aligned} y_i-E(y_i)&=y_i-(\beta_0+\beta_1x_i) \\ Q(\beta_1, \beta_2)&=\sum\limits_{i=1}^{n}(y_i-E(y_i))^2 \\ &=\sum\limits_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2 \\ \text{make }\quad\frac{\partial{Q(\beta_0,\beta_1)}}{\partial{\beta_0}}&=-2\sum\limits_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)=0 \\ \text{make }\quad\frac{\partial{Q(\beta_0,\beta_1)}}{\partial{\beta_1}}&=-2\sum\limits_{i=1}^{n}x_i(y_i-\beta_0-\beta_1x_i)=0 \\ \end{aligned} yiE(yi)Q(β1,β2)make β0Q(β0,β1)make β1Q(β0,β1)=yi(β0+β1xi)=i=1n(yiE(yi))2=i=1n(yiβ0β1xi)2=2i=1n(yiβ0β1xi)=0=2i=1nxi(yiβ0β1xi)=0
整理得到正规方程组:
n β 0 ^ + n x ˉ β 1 ^ = n y ˉ ( 1 ) n x ˉ β 0 ^ + ( ∑ i = 1 n x i 2 ) β 1 ^ = ∑ i = 1 n x i y i ( 2 ) \begin{aligned} &n\hat{\beta_0}+n\bar{x}\hat{\beta_1}=n\bar{y}\quad (1)\\ &n\bar{x}\hat{\beta_0}+(\sum\limits^{n}_{i=1}{x_i^2}) \hat{\beta_1} =\sum\limits_{i=1}^{n}x_iy_i \quad (2) \end{aligned} nβ0^+nxˉβ1^=nyˉ(1)nxˉβ0^+(i=1nxi2)β1^=i=1nxiyi(2)
解上述方程组得到:
β 1 ^ = L x y L x x β 0 ^ = y ˉ − β 1 ^ x ˉ L x x = ∑ i = 1 n ( x i − x ˉ ) 2 = ∑ i = 1 n x i 2 − n x ˉ 2 = ∑ i = 1 n x i 2 − 1 n ( ∑ i = 1 n x i ) 2 L y y = ∑ i = 1 n ( y i − y ˉ ) 2 = ∑ i = 1 n y i 2 − n y ˉ 2 = ∑ i = 1 n y i 2 − 1 n ( ∑ i = 1 n y i ) 2 L x y = ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) = ∑ i = 1 n x i y i − n x ˉ y ˉ = ∑ i = 1 n x i y i − 1 n ∑ i = 1 n x i ∑ i = 1 n y i \begin{aligned} &\hat{\beta_1}=\frac{L_{xy}}{L_{xx}} \\ &\hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x} \\ &L_{xx}=\sum\limits_{i=1}^{n}(x_i-\bar{x})^2=\sum\limits_{i=1}^{n}x_i^2-n\bar{x}^2=\sum\limits_{i=1}^{n}x_i^2-\frac{1}{n}(\sum\limits_{i=1}^{n}x_i)^2 \\ &L_{yy}=\sum\limits_{i=1}^{n}(y_i-\bar{y})^2=\sum\limits_{i=1}^{n}y_i^2-n\bar{y}^2=\sum\limits_{i=1}^{n}y_i^2-\frac{1}{n}(\sum\limits_{i=1}^{n}y_i)^2 \\ &L_{xy}=\sum\limits_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})=\sum\limits_{i=1}^{n}x_iy_i-n\bar{x}\bar{y}=\sum\limits_{i=1}^{n}x_iy_i-\frac{1}{n}\sum\limits_{i=1}^{n}x_i \sum\limits_{i=1}^{n}y_i \end{aligned} β1^=LxxLxyβ0^=yˉβ1^xˉLxx=i=1n(xixˉ)2=i=1nxi2nxˉ2=i=1nxi2n1(i=1nxi)2Lyy=i=1n(yiyˉ)2=i=1nyi2nyˉ2=i=1nyi2n1(i=1nyi)2Lxy=i=1n(xixˉ)(yiyˉ)=i=1nxiyinxˉyˉ=i=1nxiyin1i=1nxii=1nyi

如果题目中给了 ∑ \sum 形式的数据, L x x , L y y , L x y L_{xx},L_{yy},L_{xy} Lxx,Lyy,Lxy一般用上述公式最右边的方式来求。

残差/剩余平方和

Q e = ∑ i = 1 n e i 2 = ∑ i = 1 n ( y i − y i ^ ) 2 = ∑ i = 1 n ( y i − β 0 ^ − β 1 ^ x i ) 2 = L y y − β 1 ^ L x y = L y y − L x y 2 L x x Q_e=\sum\limits_{i=1}^{n}e_i^2=\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2=\sum\limits_{i=1}^{n}(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)^2=L_{yy}-\hat{\beta_1}L_{xy}=L_{yy}-\frac{L_{xy}^2}{L_{xx}} Qe=i=1nei2=i=1n(yiyi^)2=i=1n(yiβ0^β1^xi)2=Lyyβ1^Lxy=LyyLxxLxy2

定理: Q e σ 2 ∼ χ 2 ( n − 2 ) \frac{Q_e}{\sigma^2}\sim\chi^2(n-2) σ2Qeχ2(n2)
E ( Q e σ 2 ) = n − 2 ⟹ E ( Q e n − 2 ) = σ 2 ⟹ σ 2 ^ = Q e n − 2 \begin{aligned} &E(\frac{Q_e}{\sigma^2})=n-2 \\ \Longrightarrow \quad &E(\frac{Q_e}{n-2})=\sigma^2 \\ \Longrightarrow \quad &\hat{\sigma^2}=\frac{Q_e}{n-2} \end{aligned} E(σ2Qe)=n2E(n2Qe)=σ2σ2^=n2Qe
σ ^ 2 \hat{\sigma}^2 σ^2的无偏估计为 Q e n − 2 \frac{Q_e}{n-2} n2Qe

最小二乘估计量的性质

β 0 , β 1 \beta_0,\beta_1 β0,β1的最小二乘估计量都是无偏的: E ( β 0 ^ ) = β 0 , E ( β 1 ^ ) = β 1 E(\hat{\beta_0})=\beta_0,\quad E(\hat{\beta_1})=\beta_1 E(β0^)=β0,E(β1^)=β1

β 0 ^ ∼ N ( β 0 , ( 1 n + x ˉ 2 L x x ) σ 2 ) \hat{\beta_0}\sim N(\beta_0, (\frac{1}{n}+\frac{\bar{x}^2}{L_{xx}})\sigma^2) β0^N(β0,(n1+Lxxxˉ2)σ2)

β 1 ^ ∼ N ( β 1 , σ 2 L x x ) \hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) β1^N(β1,Lxxσ2)

C o v ( β 0 ^ , β 1 ^ ) = − x ˉ L x x σ 2 Cov(\hat{\beta_0},\hat{\beta_1})=-\frac{\bar{x}}{L_{xx}}\sigma^2 Cov(β0^,β1^)=Lxxxˉσ2

y 0 ^ ∼ N ( β 0 + β 1 x 0 , ( 1 n + ( x 0 − x ˉ ) 2 L x x ) σ 2 ) \hat{y_0}\sim N(\beta_0+\beta_1x_0, (\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}})\sigma^2) y0^N(β0+β1x0,(n1+Lxx(x0xˉ)2)σ2)

回归方程显著性检验(t、F、r)

  1. 提出原假设和备择假设(回归方程是否显著,反映在斜率是否为0):

H 0 : β 1 = 0 ; H 1 : β 1 ≠ 0 H_0: \beta_1=0; \quad H_1:\beta_1\neq0 H0:β1=0;H1:β1=0

  1. 选取统计量:
    β 1 ^ ∼ N ( β 1 , σ 2 L x x ) ⟹ β 1 ^ − β 1 σ 2 L x x ∼ N ( 0 , 1 ) → H 0 β 1 ^ L x x σ ∼ N ( 0 , 1 ) \begin{aligned} &\hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) \\ \Longrightarrow \quad &\frac{\hat{\beta_1}-\beta_1}{\sqrt{\frac{\sigma^2}{L_{xx}}}}\sim N(0,1) \\ \xrightarrow{H_0} \quad &\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\sigma}\sim N(0,1) \end{aligned} H0 β1^N(β1,Lxxσ2)Lxxσ2 β1^β1N(0,1)σβ1^Lxx N(0,1)
    若需构造 t t t检验,还需要一个 χ 2 \chi^2 χ2分布,而 Q e σ 2 ∼ χ 2 ( n − 2 ) \frac{Q_e}{\sigma^2}\sim\chi^2(n-2) σ2Qeχ2(n2),从而:
    T = β 1 ^ L x x σ Q e σ 2 / ( n − 2 ) → σ 2 ^ = Q e n − 2 β 1 ^ L x x σ ^ ∼ t ( n − 2 ) T=\frac{\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\sigma}}{\sqrt{\frac{Q_e}{\sigma^2}/(n-2)}}\xrightarrow{\hat{\sigma^2}=\frac{Q_e}{n-2}}\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\hat\sigma} \sim t(n-2) T=σ2Qe/(n2) σβ1^Lxx σ2^=n2Qe σ^β1^Lxx t(n2)
    若使用 F F F检验,需要计算回归平方和以及残差平方和:
    S R 2 = ∑ i = 1 n ( y i ^ − y i ˉ ) 2 = β 1 ^ L x y S e 2 = ∑ i = 1 n ( y i − y i ^ ) 2 = S T 2 − S R 2 = L y y − β 1 ^ L x y S R 2 σ 2 ∼ χ 2 ( 1 ) , S e 2 σ 2 ∼ χ 2 ( n − 2 ) F = S R 2 σ 2 / 1 S e 2 σ 2 / ( n − 2 ) = ( n − 2 ) S R 2 S e 2 ∼ F ( 1 , n − 2 ) \begin{aligned} &S_R^2=\sum\limits_{i=1}^{n}(\hat{y_i}-\bar{y_i})^2=\hat{\beta_1}L_{xy} \\ &S_e^2=\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2=S_T^2-S_R^2=L_{yy}-\hat{\beta_1}L_{xy} \\ &\frac{S_R^2}{\sigma^2}\sim \chi^2(1), \quad \frac{S_e^2}{\sigma^2}\sim \chi^2(n-2) \\ &F=\frac{\frac{S_R^2}{\sigma^2}/1}{\frac{S_e^2}{\sigma^2}/(n-2)}=\frac{(n-2)S_R^2}{S_e^2}\sim F(1,n-2) \end{aligned} SR2=i=1n(yi^yiˉ)2=β1^LxySe2=i=1n(yiyi^)2=ST2SR2=Lyyβ1^Lxyσ2SR2χ2(1),σ2Se2χ2(n2)F=σ2Se2/(n2)σ2SR2/1=Se2(n2)SR2F(1,n2)

  2. 拒绝域

    t t t检验拒绝域: ∣ T ∣ = ∣ β 1 ^ L x x σ ^ ∣ ≥ t α 2 ( n − 2 ) |T|=|\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\hat{\sigma}}|\ge t_{\frac{\alpha}{2}}(n-2) T=σ^β1^Lxx t2α(n2)

    F F F检验拒绝域: F ≥ F α ( 1 , n − 2 ) F\ge F_\alpha(1,n-2) FFα(1,n2)

  3. 确定 t α 2 ( n − 2 ) o r F α ( 1 , n − 2 ) t_{\frac{\alpha}{2}(n-2)}\quad or \quad F_{\alpha}(1,n-2) t2α(n2)orFα(1,n2)

  4. 计算 ∣ T ∣ o r F |T|\quad or\quad F TorF

  5. 判断结果

回归系数的区间估计

β 1 ^ ∼ N ( β 1 , σ 2 L x x ) ⟹ β 1 ^ − β 1 σ 2 L x x ∼ N ( 0 , 1 ) ⟹ ( β 1 ^ − β 1 ) L x x σ ∼ N ( 0 , 1 ) T = ( β 1 ^ − β 1 ) L x x σ Q e σ 2 / ( n − 2 ) → σ 2 ^ = Q e n − 2 ( β 1 ^ − β 1 ) L x x σ ^ ∼ t ( n − 2 ) \begin{aligned} &\hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) \\ \Longrightarrow \quad &\frac{\hat{\beta_1}-\beta_1}{\sqrt{\frac{\sigma^2}{L_{xx}}}}\sim N(0,1) \\ \Longrightarrow \quad &\frac{(\hat{\beta_1}-\beta_1)\sqrt{L_{xx}}}{\sigma}\sim N(0,1) \\ T=\frac{\frac{(\hat{\beta_1}-\beta_1)\sqrt{L_{xx}}}{\sigma}}{\sqrt{\frac{Q_e}{\sigma^2}/(n-2)}}&\xrightarrow{\hat{\sigma^2}=\frac{Q_e}{n-2}}\frac{(\hat{\beta_1}-\beta_1)\sqrt{L_{xx}}}{\hat\sigma} \sim t(n-2) \end{aligned} T=σ2Qe/(n2) σ(β1^β1)Lxx β1^N(β1,Lxxσ2)Lxxσ2 β1^β1N(0,1)σ(β1^β1)Lxx N(0,1)σ2^=n2Qe σ^(β1^β1)Lxx t(n2)

β 1 \beta_1 β1置信水平为 1 − α 1-\alpha 1α的置信区间为: ( β 1 ^ ± σ ^ L x x t α 2 ( n − 2 ) ) (\hat{\beta_1}\pm \frac{\hat{\sigma}}{\sqrt{L_{xx}}}t_{\frac{\alpha}{2}}(n-2)) (β1^±Lxx σ^t2α(n2))

估计

设回归方程为 y ^ = β 0 ^ + β 1 ^ x \hat{y}=\hat{\beta_0}+\hat{\beta_1}x y^=β0^+β1^x,对任意给定的 x = x 0 x=x_0 x=x0 y 0 y_0 y0的均值 E ( y 0 ) = β 0 + β 1 x 0 E(y_0)=\beta_0+\beta_1 x_0 E(y0)=β0+β1x0 E ( y 0 ) E(y_0) E(y0)的无偏估计为 y 0 ^ = β 0 ^ + β 1 ^ x 0 \hat{y_0}=\hat{\beta_0}+\hat{\beta_1}x_0 y0^=β0^+β1^x0

β 0 ^ ∼ N ( β 0 , ( 1 n + x ˉ 2 L x x ) σ 2 ) \hat{\beta_0}\sim N(\beta_0, (\frac{1}{n}+\frac{\bar{x}^2}{L_{xx}})\sigma^2) β0^N(β0,(n1+Lxxxˉ2)σ2)

β 1 ^ ∼ N ( β 1 , σ 2 L x x ) \hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) β1^N(β1,Lxxσ2)

C o v ( β 0 ^ , β 1 ^ ) = − x ˉ L x x σ 2 Cov(\hat{\beta_0},\hat{\beta_1})=-\frac{\bar{x}}{L_{xx}}\sigma^2 Cov(β0^,β1^)=Lxxxˉσ2

D ( y 0 ^ ) = D ( β 0 ^ ) + D ( β 1 ^ x 0 ) + 2 C o v ( β 0 ^ , β 1 ^ x 0 ) = ( 1 n + ( x ˉ − x 0 ) 2 L x x ) σ 2 D(\hat{y_0})=D(\hat{\beta_0})+D(\hat{\beta_1}x_0)+2Cov(\hat{\beta_0},\hat{\beta_1}x_0)=(\frac{1}{n}+\frac{(\bar{x}-x_0)^2}{L_{xx}})\sigma^2 D(y0^)=D(β0^)+D(β1^x0)+2Cov(β0^,β1^x0)=(n1+Lxx(xˉx0)2)σ2

y 0 ^ ∼ N ( β 0 + β 1 x 0 , ( 1 n + ( x 0 − x ˉ ) 2 L x x ) σ 2 ) \hat{y_0}\sim N(\beta_0+\beta_1x_0, (\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}})\sigma^2) y0^N(β0+β1x0,(n1+Lxx(x0xˉ)2)σ2)
于是 E ( y 0 ) E(y_0) E(y0)的置信度为 1 − α 1-\alpha 1α的置信区间为:
( y 0 ^ − δ 0 , y 0 ^ + δ 0 ) , δ = t α 2 ( n − 2 ) σ ^ 1 n + ( x 0 − x ˉ ) 2 L x x (\hat{y_0}-\delta_0,\hat{y_0}+\delta_0),\delta=t_{\frac{\alpha}{2}}(n-2)\hat{\sigma}\sqrt{\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}} (y0^δ0,y0^+δ0),δ=t2α(n2)σ^n1+Lxx(x0xˉ)2

区间预测

y 0 − y 0 ^ ∼ N ( 0 , [ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ] σ 2 ) U = y 0 − y 0 ^ σ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ∼ N ( 0 , 1 ) T = y 0 − y 0 ^ σ ^ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ∼ t ( n − 2 ) \begin{aligned} y_0-\hat{y_0}\sim N(0,[1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}]\sigma^2) \\ U=\frac{y_0-\hat{y_0}}{\sigma\sqrt{1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}}}\sim N(0,1) \\ T=\frac{y_0-\hat{y_0}}{\hat\sigma\sqrt{1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}}}\sim t(n-2) \end{aligned} y0y0^N(0,[1+n1+Lxx(x0xˉ)2]σ2)U=σ1+n1+Lxx(x0xˉ)2 y0y0^N(0,1)T=σ^1+n1+Lxx(x0xˉ)2 y0y0^t(n2)

因此, y 0 y_0 y0的置信度为 1 − α 1-\alpha 1α的区间为
( y 0 ^ − δ , y 0 ^ + δ ) , δ = t α 2 ( n − 2 ) σ ^ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x (\hat{y_0}-\delta,\hat{y_0}+\delta),\delta=t_{\frac{\alpha}{2}}(n-2)\hat{\sigma}\sqrt{1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}} (y0^δ,y0^+δ),δ=t2α(n2)σ^1+n1+Lxx(x0xˉ)2

可线性化的一元非线性回归

image20221022153739659.png
image20221022153756446.png
image20221022153810181.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小吴不会敲代码吧

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值