第三章一元线性回归

最新推荐文章于 2022-06-16 21:03:18 发布

喝醉酒的小白

最新推荐文章于 2022-06-16 21:03:18 发布

阅读量3.3k

点赞数 2

分类专栏：应用回归分析-俞昊东

本文链接：https://blog.csdn.net/hezuijiudexiaobai/article/details/104664445

版权

应用回归分析-俞昊东专栏收录该内容

13 篇文章 6 订阅

订阅专栏

一元线性回归模型

批量和需要劳动工时数

0 导入库&加载数据

Jupyter中文字体乱码显示问题

!apt-get install ttf-wqy-zenhei -y

导入库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std #计算预测的标准差和置信区间

from matplotlib.font_manager import FontProperties
font_set = FontProperties(fname=r"/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc", size=16) # 解决中文乱码问题

导入数据

x = [80, 30, 50, 90, 70, 60, 120, 80, 100, 50, 40, 70, 90, 20, 110, 100, 30, 50] # 批量大小
y = [399, 121, 221, 376, 361, 224, 546, 352, 353, 157, 160, 242, 389, 113, 435, 420, 212, 268] # 劳动工时数

1 散点图

plt.scatter(x, y) # 散点图
plt.xlabel("批量大小", fontproperties=font_set)
plt.ylabel("劳动工时数", fontproperties=font_set)

在这里插入图片描述
观察该散点图可以得出，需要的劳工时数与批量大小呈现很强的线性关系

2 估计的回归方程

x_ = sm.add_constant(x) # 添加一列1
mo = sm.OLS(y, x_) # 最小二乘法
result = mo.fit() # 拟合数据

print(result.summary())

在这里插入图片描述

result.params # 参数的估计值

在这里插入图片描述
$\hat y = 28.12706767 + 3.90541353 x$

参数含义

$\hat \beta_0$ : 当批量为0时，平均工时数为28.12706767
$\hat \beta_0$ : 当批量每增加一个单位，劳动工时数平均增加3.90541353个单位

3 t检验

百度
在这里插入图片描述

step1

原假设： $H_0: \beta_1=0$

对立假设： $H_1: \beta_1 \neq 0$

step2：检验统计量

$\cfrac {\hat\beta_1} {\sqrt {{\sigma^2}/{L_xx}}} \sim t(n-2)$

step3：检验统计量的实现值

$t_0 = \cfrac {\hat\beta_1} {\sqrt {{\sigma^2}/{L_xx}}}$

step4: 得出决策

P值<0.05，拒绝原假设，认为批量大小与劳工时数的线性关系显著

4 平方和分解

print('SST', result.centered_tss ) # 总平方和
print('SSR', result.ess) # 回归平方和
print('SSE', result.ssr) # 残差平方和

在这里插入图片描述
$\sum_{i=1}^{n}{(y_i - \overline{y})^2} = \sum^n_{i=i}(\hat{y_i} - y)^2 + \sum^n_{i=i}(y - \hat{y_i})^2$

SST = SSE + SSR

5 拟合优化

在这里插入图片描述
线性回归决定系数 $R^2$

$R^2 = \cfrac {SSR}{SST}$

(225394.43308270676 / 256936.5)

模型拟合良好，工时数变差的87.7%能够由批量大小解释

6 F检验

在这里插入图片描述
原假设： $H_0: \beta_1=0$

对立假设： $H_1: \beta_1 \neq 0$

检查统计量： $\cfrac {SSR/1} {SSR/n-2}$

(225394.43308270676/1) / (31542.066917293232/(18-2)) # n=18

在这里插入图片描述

拒绝原假设，批量大小和所需工时数的线性关系显著

$P\{F > F_0\} = P\{|t| > t_0\}$

F检验与t检验等价

检验统计量关系 $F_0 = t^2_0$ -> 114.3 = $10.693^2$
P值相等 -> $P\{F > F_0\} = P\{|t| > t_0\}$
结论相同 -> 拒绝原假设，认为线性关系显著

7 预测

点预测&区间预测

plt.scatter(x, y, label='orgin') # 原始数据
plt.plot(x, y_fitted, c='r', label='fit') # 预测值
plt.plot(x, confidence_interval_lower, 'y--', label='lower') # 下限
plt.plot(x, confidence_interval_upper, 'g--', label='upper') # 上限
plt.xlabel("批量大小", fontproperties=font_set)
plt.ylabel("劳动工时数", fontproperties=font_set)
plt.legend()

在这里插入图片描述