一、经典最小二乘法
①经典最小二乘法原理介绍
最小二乘法的原理实质比较简单,本质的目的是要确定如下建立的一元线性回归模型的两个回归参数
a
1
a_1
a1和
b
1
b_1
b1:
y
=
a
1
x
+
b
1
y=a_1x+b_1
y=a1x+b1
若已知m组样本观测数据
(
x
i
,
y
i
)
(
i
=
1
,
2
,
3
,
4
⋅
⋅
⋅
m
)
(x_i,y_i)(i=1,2,3,4···m)
(xi,yi)(i=1,2,3,4⋅⋅⋅m),经典的做法是根据离差平方和来达到一种最小的准则来进行确定的,即确定满足下面条件的
a
1
′
a_1'
a1′和
b
1
′
b_1'
b1′,它们使得下面函数取到最小值:
Q
(
a
1
′
,
b
1
′
)
=
∑
i
=
1
m
(
y
i
−
a
1
′
x
i
−
b
1
′
)
2
Q(a_1',b_1')=\sum_{i=1}^m(y_i-a_1'x_i-b_1')^2
Q(a1′,b1′)=i=1∑m(yi−a1′xi−b1′)2
然后的问题就变得简单了,通过我们上述
Q
Q
Q函数的定义我们不难知道
Q
(
a
1
′
,
b
1
′
)
Q(a_1',b_1')
Q(a1′,b1′)是一个非负函数,关于
a
1
′
,
b
1
′
a_1',b_1'
a1′,b1′的导数是存在的,通过分别对
a
1
′
,
b
1
′
a_1',b_1'
a1′,b1′来进行求导并令其为零会得到:
∂
Q
(
a
1
′
,
b
1
′
)
∂
a
1
′
=
∑
i
=
1
m
−
2
x
i
(
y
i
−
a
1
′
x
i
−
b
1
′
)
=
0
⟶
∑
i
=
1
m
x
i
y
i
=
∑
i
=
1
m
b
1
′
x
i
+
a
1
′
∑
i
=
1
m
x
i
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
①
\frac{\partial Q(a_1',b_1')}{\partial a_1'}=\sum_{i=1}^m-2x_i(y_i-a_1'x_i-b_1')=0\longrightarrow \sum_{i=1}^mx_iy_i=\sum_{i=1}^mb_1'x_i+a_1'\sum_{i=1}^mx_i^2·······························①
∂a1′∂Q(a1′,b1′)=i=1∑m−2xi(yi−a1′xi−b1′)=0⟶i=1∑mxiyi=i=1∑mb1′xi+a1′i=1∑mxi2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅①
∂
Q
(
a
1
′
,
b
1
′
)
∂
b
1
′
=
∑
i
=
1
m
−
2
(
y
i
−
a
1
′
x
i
−
b
1
′
)
=
0
⟶
∑
i
=
1
m
y
i
=
m
b
1
′
+
a
1
′
∑
i
=
1
m
x
i
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
②
\frac{\partial Q(a_1',b_1')}{\partial b_1'}=\sum_{i=1}^m-2(y_i-a_1'x_i-b_1')=0\longrightarrow \sum_{i=1}^my_i=mb_1'+a_1'\sum_{i=1}^mx_i·······································②
∂b1′∂Q(a1′,b1′)=i=1∑m−2(yi−a1′xi−b1′)=0⟶i=1∑myi=mb1′+a1′i=1∑mxi⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅②
联立
①
,
②
①,②
①,②分别对
m
m
m取平均我们很容易就求出来了
a
1
′
和
b
1
′
a_1'和b_1'
a1′和b1′:
a
1
′
=
x
y
‾
−
x
ˉ
y
ˉ
x
2
‾
−
(
x
‾
)
2
a_1'={\overline {xy}-\bar x\bar y\over \overline{x^2}-(\overline{x})^2}
a1′=x2−(x)2xy−xˉyˉ
b
1
′
=
y
‾
−
a
1
′
x
‾
b_1'=\overline {y}-a_1'\overline {x}
b1′=y−a1′x
二、基于离差概率平方总和最小对最小二乘法的改进
①改进原理介绍
我们都知道最小二乘法并不能保证所有的样本测试数据点都在整个回归直线上面,而是比较“均匀”的分布在直线两边,我们可以考虑一种概率来重新对数据拟合误差的大小进行一种新规定:
若已知m组样本观测数据
(
x
i
,
y
i
)
,
y
i
≠
0
(
i
=
1
,
2
,
3
,
4
⋅
⋅
⋅
m
)
(x_i,y_i),y_i\not=0(i=1,2,3,4···m)
(xi,yi),yi=0(i=1,2,3,4⋅⋅⋅m),我们新规定线性回归模型如下:
y
=
a
2
x
+
b
2
y=a_2x+b_2
y=a2x+b2
下面定义概率
P
i
P_i
Pi如下:
P
i
=
∣
y
i
−
a
2
x
i
−
b
2
∣
y
i
P_i={|y_i-a_2x_i-b_2|\over y_i}
Pi=yi∣yi−a2xi−b2∣
发现绝对值的存在显然不便于我们处理,我们不妨改进
P
i
P_i
Pi的定义如下:
P
i
(
a
2
,
b
2
)
=
(
y
i
−
a
2
x
i
−
b
2
y
i
)
2
P_i(a_2,b_2)=({y_i-a_2x_i-b_2\over y_i})^2
Pi(a2,b2)=(yiyi−a2xi−b2)2
而我们所要求的即为满足如下的函数
R
R
R的最小值时的
a
2
和
b
2
a_2和b_2
a2和b2:
R
(
a
2
,
b
2
)
=
∑
i
=
1
m
P
i
R(a_2,b_2)=\sum_{i=1}^mP_i
R(a2,b2)=i=1∑mPi
根据
R
R
R函数的定义我们知道
R
(
a
2
,
b
2
)
R(a_2,b_2)
R(a2,b2)是一个非负函数,关于
a
2
,
b
2
a_2,b_2
a2,b2有偏导数令其偏导为零有:
∂
R
(
a
2
,
b
2
)
∂
a
2
=
∑
i
=
1
m
−
2
x
i
y
i
(
1
−
x
i
y
i
a
2
−
b
2
y
i
)
=
0
⟶
∑
i
=
1
m
x
i
y
i
=
a
2
∑
i
=
1
m
x
i
2
y
i
2
+
b
2
∑
i
=
1
m
x
i
y
i
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
③
\frac{\partial R(a_2,b_2)}{\partial a_2}=\sum_{i=1}^m-2{x_i\over y_i}(1-{x_i\over y_i}a_2-{b_2\over y_i})=0\longrightarrow \sum_{i=1}^m{x_i\over y_i}=a_2\sum_{i=1}^m{x_i^2\over y_i^2}+b_2\sum_{i=1}^m{x_i\over y_i^2}·····························③
∂a2∂R(a2,b2)=i=1∑m−2yixi(1−yixia2−yib2)=0⟶i=1∑myixi=a2i=1∑myi2xi2+b2i=1∑myi2xi⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅③
∂
R
(
a
2
,
b
2
)
∂
b
2
=
∑
i
=
1
m
−
2
1
y
i
(
1
−
x
i
y
i
a
2
−
b
2
y
i
)
=
0
⟶
∑
i
=
1
m
1
y
i
=
a
2
∑
i
=
1
m
x
i
y
i
2
+
b
2
∑
i
=
1
m
1
y
i
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
④
\frac{\partial R(a_2,b_2)}{\partial b_2}=\sum_{i=1}^m-2{1\over y_i}(1-{x_i\over y_i}a_2-{b_2\over y_i})=0\longrightarrow \sum_{i=1}^m{1\over y_i}=a_2\sum_{i=1}^m{x_i\over y_i^2}+b_2\sum_{i=1}^m{1\over y_i^2}·····························④
∂b2∂R(a2,b2)=i=1∑m−2yi1(1−yixia2−yib2)=0⟶i=1∑myi1=a2i=1∑myi2xi+b2i=1∑myi21⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅④
联立
③
,
④
③,④
③,④分别对
m
m
m取平均我们很容易就求出来了
a
2
和
b
2
a_2和b_2
a2和b2:
b
2
=
(
x
y
)
‾
×
(
x
y
2
)
‾
−
(
1
y
)
‾
×
(
x
2
y
2
)
‾
(
x
y
2
)
‾
×
(
x
y
2
)
‾
−
(
1
y
2
)
‾
×
(
x
2
y
2
)
‾
b_2={\overline{({x\over y})}\times\overline{({x\over y^2})}-\overline{({1\over y})}\times\overline{({x^2\over y^2})}\over\overline{({x\over y^2})}\times\overline{({x\over y^2})}-\overline{({1\over y^2})}\times\overline{({x^2\over y^2})}}
b2=(y2x)×(y2x)−(y21)×(y2x2)(yx)×(y2x)−(y1)×(y2x2)
a
2
=
(
1
y
)
‾
−
b
2
(
1
y
2
)
‾
(
x
y
2
)
‾
a_2={\overline{({1\over y})}-b_2\overline{({1\over y^2})}\over\overline{({x\over y^2})}}
a2=(y2x)(y1)−b2(y21)
之后通过利用编程来进行计算这两个参数即可,这种方法对我们的初始数据
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)中的
y
i
y_i
yi有着特殊的要求。
三、基于垂线段总和最小对最小二乘法的改进
①改进原理介绍
我们仍假定所求回归直线为:
y
=
a
3
x
+
b
3
y=a_3x+b_3
y=a3x+b3
经典最小二乘法仅仅是考虑的两点之间的直线距离,我们不妨这样考虑:可否将我们所有的样本点
(
x
i
,
y
i
)
(
i
=
1
,
2
,
3
,
4
⋅
⋅
⋅
m
)
(x_i,y_i)(i=1,2,3,4···m)
(xi,yi)(i=1,2,3,4⋅⋅⋅m)向回归直线做垂线段,并将此平方总和作为度量指标来替代经典二乘法的距离呢?
据此想法,我们定义
S
S
S函数如下:
S
(
a
3
,
b
3
)
=
∑
i
=
1
m
(
a
3
x
i
−
y
i
+
b
3
)
2
a
3
2
+
1
S(a_3,b_3)=\sum_{i=1}^m {(a_3x_i-y_i+b_3)^2\over a_3^2+1}
S(a3,b3)=i=1∑ma32+1(a3xi−yi+b3)2
而我们所要求的即为满足如下的函数
S
S
S的最小值时的
a
3
和
b
3
a_3和b_3
a3和b3:
根据
S
S
S函数的定义我们知道
S
(
a
3
,
b
3
)
S(a_3,b_3)
S(a3,b3)是一个非负函数,关于
a
3
,
b
3
a_3,b_3
a3,b3有偏导数令其偏导为零有:
∂
S
(
a
3
,
b
3
)
∂
a
3
=
∑
i
=
1
m
2
x
i
(
a
3
x
i
−
y
i
+
b
3
)
(
a
3
2
+
1
)
−
2
a
3
(
a
3
x
i
−
y
i
+
b
3
)
2
a
3
2
+
1
=
0
\frac{\partial S(a_3,b_3)}{\partial a_3}=\sum_{i=1}^m {2x_i(a_3x_i-y_i+b_3)(a_3^2+1)-2a_3(a_3x_i-y_i+b_3)^2\over a_3^2+1}=0
∂a3∂S(a3,b3)=i=1∑ma32+12xi(a3xi−yi+b3)(a32+1)−2a3(a3xi−yi+b3)2=0
∂
S
(
a
3
,
b
3
)
∂
b
3
=
∑
i
=
1
m
2
(
a
3
x
i
−
y
i
+
b
3
)
a
3
2
+
1
=
0
\frac{\partial S(a_3,b_3)}{\partial b_3}=\sum_{i=1}^m {2(a_3x_i-y_i+b_3)\over a_3^2+1}=0
∂b3∂S(a3,b3)=i=1∑ma32+12(a3xi−yi+b3)=0
即:
∑
i
=
1
m
a
3
(
x
i
2
−
y
i
2
)
+
∑
i
=
1
m
(
a
3
2
−
1
)
x
i
y
i
+
∑
i
=
1
m
b
3
(
1
−
a
3
2
)
x
i
+
∑
i
=
1
m
2
a
3
b
3
y
i
=
a
3
b
3
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⑤
\sum_{i=1}^ma_3(x_i^2-y_i^2)+\sum_{i=1}^m(a_3^2-1)x_iy_i+\sum_{i=1}^mb_3(1-a_3^2)x_i+\sum_{i=1}^m2a_3b_3y_i=a_3b_3^2····································⑤
i=1∑ma3(xi2−yi2)+i=1∑m(a32−1)xiyi+i=1∑mb3(1−a32)xi+i=1∑m2a3b3yi=a3b32⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⑤
∑
i
=
1
m
(
a
3
x
i
−
y
i
+
b
3
)
=
0
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⑥
\sum_{i=1}^m(a_3x_i-y_i+b_3)=0·····························⑥
i=1∑m(a3xi−yi+b3)=0⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⑥
而由
⑥
⑥
⑥我们可知:
a
3
x
‾
−
y
‾
+
b
3
=
0
a_3\overline x-\overline y+b_3=0
a3x−y+b3=0
b
3
=
y
‾
−
a
3
x
‾
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⑦
b_3=\overline y-a_3\overline x·····························⑦
b3=y−a3x⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⑦
联立
⑤
,
⑦
⑤,⑦
⑤,⑦分别对
m
m
m取平均我们很容易就求出来了
a
3
和
b
3
a_3和b_3
a3和b3:
a
3
2
(
x
y
‾
−
x
ˉ
y
ˉ
)
+
a
3
(
x
2
−
y
2
‾
−
x
ˉ
x
ˉ
+
y
ˉ
y
ˉ
)
+
(
x
ˉ
y
ˉ
−
x
y
‾
)
=
0
a_3^2(\overline{xy}-\bar x\bar y)+a_3(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)+(\bar x\bar y-\overline{xy})=0
a32(xy−xˉyˉ)+a3(x2−y2−xˉxˉ+yˉyˉ)+(xˉyˉ−xy)=0
a
3
=
−
(
x
2
−
y
2
‾
−
x
ˉ
x
ˉ
+
y
ˉ
y
ˉ
)
±
(
x
2
−
y
2
‾
−
x
ˉ
x
ˉ
+
y
ˉ
y
ˉ
)
2
+
4
(
x
y
‾
−
x
ˉ
y
ˉ
)
2
2
(
x
y
‾
−
x
ˉ
y
ˉ
)
(
(
x
y
‾
−
x
ˉ
y
ˉ
)
≠
0
)
a_3={-(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)\pm\sqrt{(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)^2+4(\overline{xy}-\bar x\bar y)^2}\over2(\overline{xy}-\bar x\bar y)}((\overline{xy}-\bar x\bar y)\not=0)
a3=2(xy−xˉyˉ)−(x2−y2−xˉxˉ+yˉyˉ)±(x2−y2−xˉxˉ+yˉyˉ)2+4(xy−xˉyˉ)2((xy−xˉyˉ)=0)
a
3
=
−
(
x
ˉ
y
ˉ
−
x
y
‾
)
(
x
2
−
y
2
‾
−
x
ˉ
x
ˉ
+
y
ˉ
y
ˉ
)
(
(
x
y
‾
−
x
ˉ
y
ˉ
)
=
0
)
a_3={-(\bar x\bar y-\overline{xy})\over(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)}((\overline{xy}-\bar x\bar y)=0)
a3=(x2−y2−xˉxˉ+yˉyˉ)−(xˉyˉ−xy)((xy−xˉyˉ)=0)
再由
⑦
⑦
⑦可得
b
3
b_3
b3。
根据上面我们会发现有两个
a
3
a_3
a3供我们选择,当然我们选择哪一个是要根据题意而定的。并非都取,要结合实际数据来看。
下面笔者结合之前写过的最小二乘法改进算法利用MATLAB进行了编程实现检验,进一步证实了实际可行性。
我们假定现在有如下10个数据点如下所示:测试数据来源于应用回归分析(科学出版社,唐年胜,李会琼编著——p15)
(800,594),(1100,638),(1400,1122),(1700,1155)
(2000,1408),(2300,1595),(2600,1969),(2900,2068)
(3200,2585),(3500,2530)
①普通最小二乘法数据点和拟合曲线:
代码样例:
x=[800,1100,1400,1700,2000,2300,2600,2900,3200,3500];
y=[594,638,1122,1155,1408,1595,1969,2068,2585,2530];
avx=sum(x)/10;avy=sum(y)/10;
avxy=sum(x.*y)/10;avx2=sum(x.^2)/10;
a1=(avxy-avx*avy)/(avx2-avx^2);
b1=avy-a1*avx;
plot(x,y,'*')
hold on
y1=a1*x+b1;
plot(x,y1)
得到的拟合数据点结果和拟合回归曲线图如下所示:
回归直线①:
②概率改进最小二乘法数据点和拟合曲线:
代码样例:
x=[800,1100,1400,1700,2000,2300,2600,2900,3200,3500];
y=[594,638,1122,1155,1408,1595,1969,2068,2585,2530];
avx_y=sum((x./y))/10;
avx_y2=sum(x./(y.^2))/10;
avx2_y2=sum((x.^2)./(y.^2))/10;
av1_y=sum(1./y)/10;
av1_y2=sum(1./(y.^2))/10;
b2=(avx_y*avx_y2)-(av1_y*avx2_y2);
b2=((avx_y*avx_y2)-(av1_y*avx2_y2))/((avx_y2*avx_y2)-(av1_y2*avx2_y2));
a2=(av1_y-b2*(av1_y2))/avx_y2;
plot(x,y,'*')
hold on;
y2=a2*x+b2;
plot(x,y2);
得到的拟合数据点结果和拟合回归曲线图如下所示:
回归直线②:
③垂线段改进最小二乘法数据点和拟合曲线:
代码样例:
x=[800,1100,1400,1700,2000,2300,2600,2900,3200,3500];
y=[594,638,1122,1155,1408,1595,1969,2068,2585,2530];
avxy=sum(x.*y)/10;
avx=sum(x)/10;
avy=sum(y)/10;
avx2__y2=sum(x.^2-y.^2)/10;
q1=-(avx2__y2-avx*avx+avy*avy);
q2=sqrt(q1^2+4*((avxy-avx*avy)^2));
q3=2*(avxy-avx*avy);
a3=(q1+q2)/q3;
b3=avy-a3*avx;
plot(x,y,'*');
hold on;
y3=a3*x+b3;
plot(x,y3)
得到的拟合数据点结果和拟合回归曲线图如下所示:
回归直线③:
上述为笔者考虑到的最小二乘法的一些改进方案,主要考虑了概率和垂线两种情况,未尝难免出现数值稳定性考虑不周的情况,恳请大家批评指正,笔者在此深表感谢!