思路:
1、从0~10,生成等间距20个数作为x,
2、利用回归公式 y=5 + 2x +
3、计算y值
4、对数据进行估计
#生成从0到10之间选20个等间距的数
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
nsample = 20
#从0到10之间选20个等间距的数
x=np.linspace(0,10,nsample)
x
array([ 0. , 0.52631579, 1.05263158, 1.57894737, 2.10526316,
2.63157895, 3.15789474, 3.68421053, 4.21052632, 4.73684211,
5.26315789, 5.78947368, 6.31578947, 6.84210526, 7.36842105,
7.89473684, 8.42105263, 8.94736842, 9.47368421, 10. ])
#使用最小二乘法,需要在数组的前面添加一列 1,目的是与常数项组合
X=sm.add_constant(x)
X
array([[ 1. , 0. ],
[ 1. , 0.52631579],
[ 1. , 1.05263158],
[ 1. , 1.57894737],
[ 1. , 2.10526316],
[ 1. , 2.63157895],
[ 1. , 3.15789474],
[ 1. , 3.68421053],
[ 1. , 4.21052632],
[ 1. , 4.73684211],
[ 1. , 5.26315789],
[ 1. , 5.78947368],
[ 1. , 6.31578947],
[ 1. , 6.84210526],
[ 1. , 7.36842105],
[ 1. , 7.89473684],
[ 1. , 8.42105263],
[ 1. , 8.94736842],
[ 1. , 9.47368421],
[ 1. , 10. ]])
#构造y值,β0=2,β1=5
bate = np.array([2,5])
bate
array([2, 5])
#设计误差数据,构造高斯分布
e=np.random.normal(size=nsample)
e
array([-0.08130226, -0.99898515, -0.46717904, -0.52487297, -0.85998302,
1.00102852, 0.61557834, 0.4359724 , 1.36966089, -0.17069984,
0.33877027, -1.602145 , -0.1940928 , 1.58914167, -2.09103106,
-0.87802483, -0.46069062, -2.32511203, -1.42386623, -0.22494043])
#实际值,y=β0 + x*β1 + e,构造出来的用于测试的真实值
y=np.dot(X,bate)+e
y
array([ 1.91869774, 3.6325938 , 6.79597886, 9.36986387, 11.66633277,
16.15892325, 18.40505202, 20.85702504, 24.42229247, 25.51351069,
28.65455974, 29.34522342, 33.38485457, 37.79966799, 36.75107421,
40.59565938, 43.64457254, 44.41173008, 47.94455482, 51.77505957])
数据构造完毕,计算回归方程
#最小二乘法
model=sm.OLS(y,X)
#拟合数据
res=model.fit()
#回归系数,即β0、β2
res.params
array([2.15061173, 4.90034992])
#查看全部评估结果数据
res.summary()
Dep. Variable: | y | R-squared: | 0.996 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.995 |
Method: | Least Squares | F-statistic: | 4072. |
Date: | Thu, 13 Sep 2018 | Prob (F-statistic): | 1.15e-22 |
Time: | 10:44:47 | Log-Likelihood: | -28.152 |
No. Observations: | 20 | AIC: | 60.30 |
Df Residuals: | 18 | BIC: | 62.30 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 2.1506 | 0.449 | 4.788 | 0.000 | 1.207 | 3.094 |
x1 | 4.9003 | 0.077 | 63.815 | 0.000 | 4.739 | 5.062 |
Omnibus: | 0.468 | Durbin-Watson: | 1.957 |
---|---|---|---|
Prob(Omnibus): | 0.791 | Jarque-Bera (JB): | 0.572 |
Skew: | 0.274 | Prob(JB): | 0.751 |
Kurtosis: | 2.378 | Cond. No. | 11.5 |
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
#拟合估计值
y_=res.fittedvalues
y_
array([ 2.15061173, 4.72974327, 7.30887481, 9.88800634, 12.46713788,
15.04626942, 17.62540096, 20.2045325 , 22.78366403, 25.36279557,
27.94192711, 30.52105865, 33.10019019, 35.67932172, 38.25845326,
40.8375848 , 43.41671634, 45.99584788, 48.57497942, 51.15411095])
#绘图
fig,ax=plt.subplots(figsize=(8,6))
ax.plot(x,y,'o',label='data')#原始数据
ax.plot(x,y_,'r--',label='test')#拟合数据
ax.legend(loc='best')
plt.show()