最小二乘法

最小二乘法(又称最小平方法)是一种数学优化技术。它通过最小化误差的平方和寻找数据的最佳函数匹配。

利用最小二乘法可以简便地求得未知的数据,并使得这些求得的数据与实际数据之间误差的平方和为最小。

最小二乘法还可用于曲线拟合

其他一些优化问题也可通过最小化能量或最大化用最小二乘法来表达。

示例[编辑]

数据点(红色)、使用最小二乘法求得的最佳解(蓝色)、误差(绿色)。

某次实验得到了四个数据点 (x, y)(1, 6)(2, 5)(3, 7)(4, 10)(右图中红色的点)。我们希望找出一条和这四个点最匹配的直线 y=\beta_1+\beta_2 x,即找出在某种“最佳情况”下能够大致符合如下超定线性方程组的 \beta_1 和 \beta_2

\begin{alignat}{4}\beta_1  +  1\beta_2 &&\; = \;&& 6 & \\\beta_1  +  2\beta_2 &&\; = \;&& 5 & \\\beta_1  +  3\beta_2 &&\; = \;&& 7 & \\\beta_1  +  4\beta_2 &&\; = \;&& 10 & \\\end{alignat}

最小二乘法采用的手段是尽量使得等号两边的方差最小,也就是找出这个函数的最小值:

\begin{align}S(\beta_1, \beta_2) = &\left[6-(\beta_1+1\beta_2)\right]^2+\left[5-(\beta_1+2\beta_2)   \right]^2 \\&+\left[7-(\beta_1 +  3\beta_2)\right]^2+\left[10-(\beta_1  +  4\beta_2)\right]^2.\\\end{align}

最小值可以通过对 S(\beta_1, \beta_2) 分别求 \beta_1 和 \beta_2 的偏导数,然后使它们等于零得到。

\frac{\partial S}{\partial \beta_1}=0=8\beta_1 + 20\beta_2 -56
\frac{\partial S}{\partial \beta_2}=0=20\beta_1 + 60\beta_2 -154.

如此就得到了一个只有两个未知数的方程组,很容易就可以解出:

\beta_1=3.5
\beta_2=1.4

也就是说直线 y=3.5+1.4x 是最佳的。

简介[编辑]

高斯

1801年,意大利天文学家朱赛普·皮亚齐发现了第一颗小行星谷神星。经过40天的跟踪观测后,由于谷神星运行至太阳背后,使得皮亚齐失去了谷神星的位置。随后全世界的科学家利用皮亚齐的观测数据开始寻找谷神星,但是根据大多数人计算的结果来寻找谷神星都没有结果。时年24岁的高斯也计算了谷神星的轨道。奥地利天文学家海因里希·奥尔伯斯根据高斯计算出来的轨道重新发现了谷神星。

高斯使用的最小二乘法的方法发表于1809年他的著作《天体运动论》中,而法国科学家勒让德于1806年独立发现“最小二乘法”,但因不为时人所知而默默无闻。两人曾为谁最早创立最小二乘法原理发生争执。

1829年,高斯提供了最小二乘法的优化效果强于其他方法的证明,见高斯-马尔可夫定理

方法[编辑]

人们对由某一变量t 或多个变量t_{1}……t_{n} 构成的相关变量y感兴趣。如弹簧形变与所用的力相关,一个企业的盈利与其营业额投资收益原始资本有关。为了得到这些变量同y之间的关系,便用不相关变量去构建y,使用如下函数模型

y_m = f(t_1,\dots, t_q;b_1,\dots,b_p),

q个独立变量或p个系数去拟和。

通常人们将一个可能的、对不相关变量t的构成都无困难的函数类型称作函数模型(如抛物线函数或指数函数)。参数b是为了使所选择的函数模型同观测值y相匹配。(如在测量弹簧形变时,必须将所用的力与弹簧的膨胀系数联系起来)。其目标是合适地选择参数,使函数模型最好的拟合观测值。一般情况下,观测值远多于所选择的参数。

其次的问题是怎样判断不同拟合的质量。高斯勒让德的方法是,假设测量误差的平均值为0。令每一个测量误差对应一个变量并与其它测量误差不相关(随机无关)。人们假设,在测量误差中绝对不含系统误差,它们应该是偶然误差(有固定的变异数),围绕真值波动。除此之外,测量误差符合正态分布,这保证了偏差值在最后的结果y上忽略不计。

确定拟合的标准应该被重视,并小心选择,较大误差的测量值应被赋予较小的。并建立如下规则:被选择的参数,应该使算出的函数曲线与观测值之差的平方和最小。用函数表示为:

 \min_{\vec{b}} { \sum_{i=1}^{n}(y_m - y_i)^2} .

欧几里得度量表达为:

 \min_{ \vec{b} } \| \vec{y}_{m} ( \vec{b} ) - \vec{y} \|_{2} \ .

最小化问题的精度,依赖于所选择的函数模型。

线性函数模型[编辑]

典型的一类函数模型是线性函数模型。最简单的线性式是y = b_0 + b_1 t,写成矩阵式,为

 \min_{b_0,b_1}\left\|\begin{pmatrix}1 & t_1 \\ \vdots & \vdots \\ 1 & t_n  \end{pmatrix} \begin{pmatrix} b_0\\ b_1\end{pmatrix} - \begin{pmatrix} y_1 \\ \vdots \\ y_{n}\end{pmatrix}\right\|_{2} = \min_b\|Ab-Y\|_2.

直接给出该式的参数解:

b_1 = \frac{\sum_{i=1}^n t_iy_i - n \cdot \bar t \bar y}{\sum_{i=1}^n t_i^2- n \cdot (\bar t)^2} 和  b_0 = \bar y - b_1 \bar t

其中\bar t = \frac{1}{n} \sum_{i=1}^n t_i,为t值的算术平均值。也可解得如下形式:

b_1 = \frac{\sum_{i=1}^n (t_i - \bar t)(y_i - \bar y)}{\sum_{i=1}^n (t_i - \bar t)^2}

简单线性模型 y = b0 + b1t 的例子[编辑]

随机选定10艘战舰,并分析它们的长度与宽度,寻找它们长度与宽度之间的关系。由下面的描点图可以直观地看出,一艘战舰的长度(t)与宽度(y)基本呈线性关系。散点图如下: 散点图

以下图表列出了各战舰的数据,随后步骤是采用最小二乘法确定两变量间的线性关系。

编号 长度 (m) 宽度 (m) ti - t yi - y   
i ti yi ti* yi* ti*yi* ti*ti* yi*yi*
120821.640.23.19128.2381616.0410.1761
215215.5-15.8-2.9145.978249.648.4681
311310.4-54.8-8.01438.9483003.0464.1601
422731.059.212.59745.3283504.64158.5081
513713.0-30.8-5.41166.628948.6429.2681
623832.470.213.99982.0984928.04195.7201
717819.010.20.596.018104.040.3481
810410.4-63.8-8.01511.0384070.4464.1601
919119.023.20.5913.688538.240.3481
1013011.8-37.8-6.61249.8581428.8443.6921
总和(Σ) 1678 184.1 0.0 0.00 3287.820 20391.60 574.8490


仿照上面给出的例子

\bar t = \frac {\sum_{i=1}^n t_i}{n} = \frac {1678}{10} = 167{.}8 并得到相应的\bar y = 18{.}41.

然后确定b1

b_1 = \frac{\sum_{i=1}^n (t_i- \bar {t})(y_i - \bar y)}{\sum_{i=1}^n (t_i- \bar t)^2}
 = \frac{3287{.}820} {20391{.}60} = 0{.}1612 \;,

可以看出,战舰的长度每变化1m,相对应的宽度便要变化16cm。并由下式得到常数项b0

b_0 = \bar y - b_1 \bar t = 18{.}41 - 0{.}1612 \cdot 167{.}8 = -8{.}6394\;,

在这里随机理论不加阐述。可以看出点的拟合非常好,长度宽度的相关性大约为96.03%。 利用Matlab得到拟合直线: 最小二乘法Matlab拟合

一般线性情况[编辑]

若含有更多不相关模型变量t_1, ..., t_q,可如组成线性函数的形式

y(t_1,\dots,t_q;b_0, b_1, \dots, b_q )= b_0 + b_1 t_1 + \cdots + b_q t_q

线性方程组

 \begin{matrix}b_0 + b_1 t_{11} + \cdots + b_j t_{1j}+ \cdots +b_q t_{1q} = y_1\\b_0 + b_1 t_{21} + \cdots + b_j t_{2j}+ \cdots +b_q t_{2q} = y_2\\\vdots \\b_0 + b_1 t_{i1} + \cdots + b_j t_{ij}+ \cdots +b_q t_{iq}= y_i\\\vdots\\b_0 + b_1 t_{n1} + \cdots + b_j t_{nj}+ \cdots +b_q t_{nq}= y_n\end{matrix}

通常人们将tij记作数据矩阵 A,参数bj记做参数向量b,观测值yi记作Y,则线性方程组又可写成:

 \begin{pmatrix}1 & t_{11} & \cdots & t_{1j} \cdots & t_{1q}\\1 & t_{21} & \cdots & t_{2j} \cdots & t_{2q}\\\vdots \\1 & t_{i1} & \cdots & t_{ij} \cdots & t_{iq}\\\vdots\\1 & t_{n1} & \cdots & t_{nj} \cdots & t_{nq}\end{pmatrix}\cdot\begin{pmatrix}b_0\\b_1\\b_2\\\vdots \\b_j\\\vdots\\b_q\end{pmatrix}=\begin{pmatrix}y_1\\y_2\\\vdots \\y_i\\\vdots\\y_n\end{pmatrix} 即  Ab = Y

上述方程运用最小二乘法导出为线性平方差计算的形式为:

\min_b\|Ab-Y\|_2

最小二乘法的解[编辑]

\min_b \left \|\boldsymbol{Ab}- \boldsymbol{Y} \right \|_2,\boldsymbol{A}\in\mathbf{C}^{m\times n},\boldsymbol{Y}\in\mathbf{C}^{n}

的特解为A广义逆矩阵Y的乘积,这同时也是二范数极小的解,其通解为特解加上A零空间。证明如下:

先将Y拆成A的值域及其正交补两部分

\boldsymbol{Y}=\boldsymbol{Y}_{1}+\boldsymbol{Y}_{2}
\boldsymbol{Y}_{1}=\boldsymbol{A}\boldsymbol{A}^\dagger\boldsymbol{Y}\in R\left(\boldsymbol{A} \right)
\boldsymbol{Y}_{2}=\left(\boldsymbol{I}- \boldsymbol{A}\boldsymbol{A}^\dagger \right)\boldsymbol{Y}\in R\left(\boldsymbol{A} \right)^{\bot}

所以\boldsymbol{Ab}-\boldsymbol{Y}_{1}\in R\left(\boldsymbol{A} \right),可得

\left \| \boldsymbol{Ab}- \boldsymbol{Y} \right \|^{2}=\left \| \boldsymbol{Ab}- \boldsymbol{Y}_{1} +\left(-\boldsymbol{Y}_{2}\right) \right \|^{2}=\left \| \boldsymbol{Ab}- \boldsymbol{Y}_{1} \right \|^{2}+\left \|\boldsymbol{Y}_{2} \right \|^{2}

故当且仅当\boldsymbol{b}\boldsymbol{Ab}= \boldsymbol{Y}_{1} =\boldsymbol{A}\boldsymbol{A}^\dagger\boldsymbol{Y}解时,\boldsymbol{b}即为最小二乘解,即\boldsymbol{b}=\boldsymbol{A}^\dagger \boldsymbol{Y} = {\left( {​{​{\mathbf{A}}^H}{\mathbf{A}}} \right)^{ - 1}}{​{\mathbf{A}}^H}{\mathbf{Y}}

又因为

N\left(\boldsymbol{A}\right)=N\left(\boldsymbol{A}^\dagger \boldsymbol{A}\right)=R\left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A}\right)=\left\{ \left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h}:\boldsymbol{h}\in\mathbf{C}^{n}  \right\}

\boldsymbol{Ab}=\boldsymbol{A}\boldsymbol{A}^\dagger\boldsymbol{Y}的通解为

\boldsymbol{b}=\boldsymbol{A}^\dagger\boldsymbol{Y}+\left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h}:\boldsymbol{h}\in\mathbf{C}^{n}

因为

\begin{align}\left \| \boldsymbol{A}^\dagger\boldsymbol{Y}\right \|^{2} & < \left \| \boldsymbol{A}^\dagger\boldsymbol{Y} \right \|^{2}+ \left \| \left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h} \right \|^{2} \\& = \left \| \boldsymbol{A}^\dagger\boldsymbol{Y} + \left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h} \right \|^{2},\left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h}\neq\boldsymbol{0} \\\end{align}

所以\boldsymbol{A}^\dagger \boldsymbol{Y}又是二范数极小的最小二乘解。


以上摘自WiKi百科


Age
(years)
DBH
(inch)
9712.5
9312.5
888.0
819.5
7516.5
5711.0
5210.5
459.0
286.0
151.5
121.0
111.0

更多请看这里

Curve Fitting, Regression
 

Field data is often accompanied by noise. Even though all control parameters (independent variables) remain constant, the resultant outcomes (dependent variables) vary. A process of quantitatively estimating the trend of the outcomes, also known as regression or curve fitting, therefore becomes necessary.

The curve fitting process fits equations of approximating curves to the raw field data. Nevertheless, for a given set of data, the fitting curves of a given type are generally NOT unique. Thus, a curve with a minimal deviation from all data points is desired. This best-fitting curve can be obtained by the method of least squares.

 
The Method of Least Squares
 

The method of least squares assumes that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least square error) from a given set of data.

Suppose that the data points are (x1,y1)(x2,y2), ..., (xn,yn) where x is the independent variable and y is the dependent variable. The fitting curve f(x) has the deviation (error)d from each data point, i.e., d1=y1-f(x1)d2=y2-f(x2), ..., dn=yn-f(xn). According to the method of least squares, the best fitting curve has the property that:

Polynomials Least-Squares Fitting
 

Polynomials are one of the most commonly used types of curves in regression. The applications of the method of least squares curve fitting using polynomials are briefly discussed as follows. To obtain further information on a particular curve fitting, please click on the link at the end of each item. Or try the calculator on the right

The Least-Squares Line
 

The least-squares line uses a straight line

y=a+bx
Least-Squares Line Related Calculator

to approximate the given set of data, (x1,y1)(x2,y2), ..., (xn,yn), where n>=2. The best fitting curve f(x) has the least square error, i.e.,

Please note that a and b are unknown coefficients while all xi and yi are given. To obtain the least square error, the unknown coefficients a and b must yield zero first derivatives.

Expanding the above equations, we have:

The unknown coefficients a and b can therefore be obtained:

where Sigma stands for Sigma...i.

The Least-Squares Parabola
 

The least-squares parabola uses a second degree curve

y=a+bx+cx^2
Least-Squares Parabola Related Calculator

to approximate the given set of data, (x1,y1)(x2,y2), ..., (xn,yn), where n>=3. The best fitting curve f(x) has the least square error, i.e.,

Please note that ab, and c are unknown coefficients while all xi and yi are given. To obtain the least square error, the unknown coefficients ab, and c must yield zero first derivatives.

Expanding the above equations, we have

The unknown coefficients ab, and c can hence be obtained by solving the above linear equations.

The Least-Squares mth Degree Polynomials
 

When using an mth degree polynomial

y=a0+a1x+a2x^2+...+amx^m
Least-Squares Polynomials Related Calculator

to approximate the given set of data, (x1,y1), (x2,y2), ..., (xn,yn), where n>=3, the best fitting curve f(x) has the least square error, i.e.,

Please note that a0, a1, a2, ..., and am are unknown coefficients while all xi and yi are given. To obtain the least square error, the unknown coefficients a0, a1, a2, ..., and am must yield zero first derivatives.

Expanding the above equations, we have

The unknown coefficients a0, a1, a2, ..., and am can hence be obtained by solving the above linear equations. 

Multiple Regression
 

Multiple regression estimates the outcomes (dependent variables) which may be affected by more than one control parameter (independent variables) or there may be more than one control parameter being changed at the same time.

An example is the two independent variables x and y and one dependent variable z in the linear relationship case:

z=ax+by

For a given data set (x1,y1,z1), (x2,y2,z2), ..., (xn,yn,zn), where n>=3, the best fitting curve f(x) has the least square error, i.e.,

Please note that a, b, and c are unknown coefficients while all xi, yi, and ziare given. To obtain the least square error, the unknown coefficients a, b, and c must yield zero first derivatives.

Expanding the above equations, we have

The unknown coefficients a, b, and c can hence be obtained by solving the above linear equations.  

也可以看到如下简单地理论:

Line of Best Fit(Least Square Method)

line of best fit is a straight line that is the best approximation of the given set of data.

It is used to study the nature of the relation between two variables.

A line of best fit can be roughly determined using an eyeball method by drawing a straight line on a scatter plot so that the number of points above the line and below the line is about equal (and the line passes through as many points as possible).

A more accurate way of finding the line of best fit is the least square method .

Use the following steps to find the equation of line of best fit for a set of ordered pairs.

Step 1: Calculate the mean of the x-values and the mean of the y-values.

Step 2: Compute the sum of the squares of the x-values.

Step 3: Compute the sum of each x-value multiplied by its corresponding y-value.

Step 4: Calculate the slope of the line using the formula:

  

where is the total number of data points.

Step 5: Compute the y-intercept of the line by using the formula:

where  are the mean of the x- and y-coordinates of the data points respectively.

Step 6: Use the slope and the -intercept to form the equation of the line.

Example:

Use the least square method to determine the equation of line of best fit for the data. Then plot the line.

Solution:

Plot the points on a coordinate plane.

Calculate the means of the x-values and the y-values, the sum of squares of the x-values, and the sum of each x-value multiplied by its corresponding y-value.

Calculate the slope.

Calculate the y-intercept.

First, calculate the mean of the x-values and that of the y-values.

Use the formula to compute the y-intercept.

Use the slope and y-intercept to form the equation of the line of best fit.

The slope of the line is –1.1 and the -intercept is 14.0.

Therefore, the equation is = –1.1 + 14.0.

Draw the line on the scatter plot.


最小二乘法在解决机器学习中的回归分析

最小二乘法的简单具体实现:

最基本的:

#include<iostream>
#include<conio.h>
#include<math.h>
using namespace std;
int main()
{
	int n,i,j;
	float a,a0,a1,x[10],f[10],sumx=0,sumy=0,sumxy=0,sumx2=0;
	cout<<"Enter no of sample points ? ";cin>>n;
	cout<<"Enter all sample points: "<<endl;
	for(i=0;i<n;i++)
	{
		cin>>x[i]>>f[i]; // read both (x,f(x))
		sumx+=x[i];
		sumy+=f[i];
		sumxy+=x[i]*f[i];
		sumx2+=x[i]*x[i];
	}
	cout<<"your sample x ? ";
	cin>>a;
	a0=(sumy*sumx2-sumx*sumxy)/(n*sumx2-sumx*sumx);
	a1=(n*sumxy-sumx*sumy)/(n*sumx2-sumx*sumx);
	cout<<"The coefficients are : "<<endl<<a0<<endl<<a1;
	cout<<endl<<"f("<<a<<"): "<<(a0+a1*a);
	getch();
	return 0;
}

CvMat形式

#include<iostream>
#include <opencv2/opencv.hpp>
using namespace std;
using namespace cv;

void main()  
{  
	float   a[9]={1,2,3,4,5,7,6,8,9};  
	float   b[3]={2,3,1};  
	CvMat*   A   =   cvCreateMat(3,3,CV_32FC1);  
	CvMat*   X   =   cvCreateMat(3,1,CV_32FC1);  
	CvMat*   B   =   cvCreateMat(3,1,CV_32FC1);  

	cvSetData(A,a,CV_AUTOSTEP);  
	cvSetData(B,b,CV_AUTOSTEP);  
	cvSolve(A,B,X,CV_LU);   //   solve (Ax=b) for x  

	printf("A:");  
	for(int i=0;i < 9; i++)
	{  
		if(i%3==0) printf("\n");  
		printf("\t%f",A->data.fl[i]);  
	}  

	printf("\nX:\n");  
	for(int i = 0;i < 3; i++)
	{  
		printf("\t%f",X->data.fl[i]);  
	}  

	printf("\nb:\n");  
	for(int i=0;i<3;i++)
	{  
		printf("\t%f",B->data.fl[i]);  
	}  
	system("pause");
}  


更详细的内容请看 这里


  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值