最小二乘法

最新推荐文章于 2024-05-27 23:40:33 发布

Samwell-Tarly

最新推荐文章于 2024-05-27 23:40:33 发布

阅读量8.5k

点赞数 2

分类专栏： Machine Learning

本文链接：https://blog.csdn.net/yang6464158/article/details/24477547

版权

Machine Learning 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

最小二乘法（又称最小平方法）是一种数学优化技术。它通过最小化误差的平方和寻找数据的最佳函数匹配。

利用最小二乘法可以简便地求得未知的数据，并使得这些求得的数据与实际数据之间误差的平方和为最小。

最小二乘法还可用于曲线拟合。

其他一些优化问题也可通过最小化能量或最大化熵用最小二乘法来表达。

示例[编辑]

数据点（红色）、使用最小二乘法求得的最佳解（蓝色）、误差（绿色）。

某次实验得到了四个数据点 $(x, y)$ ： $(1, 6)$ 、 $(2, 5)$ 、 $(3, 7)$ 、 $(4, 10)$ （右图中红色的点）。我们希望找出一条和这四个点最匹配的直线 $y=\beta_1+\beta_2 x$ ，即找出在某种“最佳情况”下能够大致符合如下超定线性方程组的 $\beta_1$ 和 $\beta_2$ ：

\begin{alignat}{4}\beta_1 + 1\beta_2 &&\; = \;&& 6 & \\\beta_1 + 2\beta_2 &&\; = \;&& 5 & \\\beta_1 + 3\beta_2 &&\; = \;&& 7 & \\\beta_1 + 4\beta_2 &&\; = \;&& 10 & \\\end{alignat}

最小二乘法采用的手段是尽量使得等号两边的方差最小，也就是找出这个函数的最小值：

\begin{align}S(\beta_1, \beta_2) = &\left[6-(\beta_1+1\beta_2)\right]^2+\left[5-(\beta_1+2\beta_2) \right]^2 \\&+\left[7-(\beta_1 + 3\beta_2)\right]^2+\left[10-(\beta_1 + 4\beta_2)\right]^2.\\\end{align}

最小值可以通过对 $S(\beta_1, \beta_2)$ 分别求 $\beta_1$ 和 $\beta_2$ 的偏导数，然后使它们等于零得到。

\frac{\partial S}{\partial \beta_1}=0=8\beta_1 + 20\beta_2 -56

\frac{\partial S}{\partial \beta_2}=0=20\beta_1 + 60\beta_2 -154.

如此就得到了一个只有两个未知数的方程组，很容易就可以解出：

\beta_1=3.5

\beta_2=1.4

也就是说直线 $y=3.5+1.4x$ 是最佳的。

简介[编辑]

高斯

1801年，意大利天文学家朱赛普·皮亚齐发现了第一颗小行星谷神星。经过40天的跟踪观测后，由于谷神星运行至太阳背后，使得皮亚齐失去了谷神星的位置。随后全世界的科学家利用皮亚齐的观测数据开始寻找谷神星，但是根据大多数人计算的结果来寻找谷神星都没有结果。时年24岁的高斯也计算了谷神星的轨道。奥地利天文学家海因里希·奥尔伯斯根据高斯计算出来的轨道重新发现了谷神星。

高斯使用的最小二乘法的方法发表于1809年他的著作《天体运动论》中，而法国科学家勒让德于1806年独立发现“最小二乘法”，但因不为时人所知而默默无闻。两人曾为谁最早创立最小二乘法原理发生争执。

1829年，高斯提供了最小二乘法的优化效果强于其他方法的证明，见高斯-马尔可夫定理。

方法[编辑]

人们对由某一变量 $t$ 或多个变量 $t_{1}$ …… $t_{n}$ 构成的相关变量 $y$ 感兴趣。如弹簧的形变与所用的力相关，一个企业的盈利与其营业额，投资收益和原始资本有关。为了得到这些变量同 $y$ 之间的关系，便用不相关变量去构建 $y$ ，使用如下函数模型

y_m = f(t_1,\dots, t_q;b_1,\dots,b_p)

$q$ 个独立变量或 $p$ 个系数去拟和。

通常人们将一个可能的、对不相关变量t的构成都无困难的函数类型称作函数模型（如抛物线函数或指数函数）。参数b是为了使所选择的函数模型同观测值y相匹配。（如在测量弹簧形变时，必须将所用的力与弹簧的膨胀系数联系起来）。其目标是合适地选择参数，使函数模型最好的拟合观测值。一般情况下，观测值远多于所选择的参数。

其次的问题是怎样判断不同拟合的质量。高斯和勒让德的方法是，假设测量误差的平均值为0。令每一个测量误差对应一个变量并与其它测量误差不相关（随机无关）。人们假设，在测量误差中绝对不含系统误差，它们应该是纯偶然误差(有固定的变异数)，围绕真值波动。除此之外，测量误差符合正态分布，这保证了偏差值在最后的结果y上忽略不计。

确定拟合的标准应该被重视，并小心选择，较大误差的测量值应被赋予较小的权。并建立如下规则：被选择的参数，应该使算出的函数曲线与观测值之差的平方和最小。用函数表示为：

$\min_{\vec{b}} { \sum_{i=1}^{n}(y_m - y_i)^2} .$

用欧几里得度量表达为：

$\min_{ \vec{b} } \| \vec{y}_{m} ( \vec{b} ) - \vec{y} \|_{2} \ .$

最小化问题的精度，依赖于所选择的函数模型。

线性函数模型[编辑]

典型的一类函数模型是线性函数模型。最简单的线性式是 $y = b_0 + b_1 t$ ，写成矩阵式，为

\min_{b_0,b_1}\left\|\begin{pmatrix}1 & t_1 \\ \vdots & \vdots \\ 1 & t_n \end{pmatrix} \begin{pmatrix} b_0\\ b_1\end{pmatrix} - \begin{pmatrix} y_1 \\ \vdots \\ y_{n}\end{pmatrix}\right\|_{2} = \min_b\|Ab-Y\|_2.

直接给出该式的参数解：

b_1 = \frac{\sum_{i=1}^n t_iy_i - n \cdot \bar t \bar y}{\sum_{i=1}^n t_i^2- n \cdot (\bar t)^2}

和

b_0 = \bar y - b_1 \bar t

其中 $\bar t = \frac{1}{n} \sum_{i=1}^n t_i$ ，为t值的算术平均值。也可解得如下形式：

b_1 = \frac{\sum_{i=1}^n (t_i - \bar t)(y_i - \bar y)}{\sum_{i=1}^n (t_i - \bar t)^2}

简单线性模型 y = b₀ + b₁t 的例子[编辑]

随机选定10艘战舰，并分析它们的长度与宽度，寻找它们长度与宽度之间的关系。由下面的描点图可以直观地看出，一艘战舰的长度（t）与宽度（y）基本呈线性关系。散点图如下：

以下图表列出了各战舰的数据，随后步骤是采用最小二乘法确定两变量间的线性关系。

i	t_i	y_i	t_i*	y_i*	t_iy_i	t_it_i	y_iy_i
编号	长度 (m)	宽度 (m)	t_i - t	y_i - y
1	208	21.6	40.2	3.19	128.238	1616.04	10.1761
2	152	15.5	-15.8	-2.91	45.978	249.64	8.4681
3	113	10.4	-54.8	-8.01	438.948	3003.04	64.1601
4	227	31.0	59.2	12.59	745.328	3504.64	158.5081
5	137	13.0	-30.8	-5.41	166.628	948.64	29.2681
6	238	32.4	70.2	13.99	982.098	4928.04	195.7201
7	178	19.0	10.2	0.59	6.018	104.04	0.3481
8	104	10.4	-63.8	-8.01	511.038	4070.44	64.1601
9	191	19.0	23.2	0.59	13.688	538.24	0.3481
10	130	11.8	-37.8	-6.61	249.858	1428.84	43.6921
总和（Σ）	1678	184.1	0.0	0.00	3287.820	20391.60	574.8490

仿照上面给出的例子

$\bar t = \frac {\sum_{i=1}^n t_i}{n} = \frac {1678}{10} = 167{.}8$ 并得到相应的 $\bar y = 18{.}41$ .

然后确定b₁

b_1 = \frac{\sum_{i=1}^n (t_i- \bar {t})(y_i - \bar y)}{\sum_{i=1}^n (t_i- \bar t)^2}

= \frac{3287{.}820} {20391{.}60} = 0{.}1612 \;,

可以看出，战舰的长度每变化1m，相对应的宽度便要变化16cm。并由下式得到常数项b₀：

b_0 = \bar y - b_1 \bar t = 18{.}41 - 0{.}1612 \cdot 167{.}8 = -8{.}6394\;,

在这里随机理论不加阐述。可以看出点的拟合非常好，长度和宽度的相关性大约为96.03％。利用Matlab得到拟合直线：

一般线性情况[编辑]

若含有更多不相关模型变量 $t_1, ..., t_q$ ，可如组成线性函数的形式

y(t_1,\dots,t_q;b_0, b_1, \dots, b_q )= b_0 + b_1 t_1 + \cdots + b_q t_q

即线性方程组

\begin{matrix}b_0 + b_1 t_{11} + \cdots + b_j t_{1j}+ \cdots +b_q t_{1q} = y_1\\b_0 + b_1 t_{21} + \cdots + b_j t_{2j}+ \cdots +b_q t_{2q} = y_2\\\vdots \\b_0 + b_1 t_{i1} + \cdots + b_j t_{ij}+ \cdots +b_q t_{iq}= y_i\\\vdots\\b_0 + b_1 t_{n1} + \cdots + b_j t_{nj}+ \cdots +b_q t_{nq}= y_n\end{matrix}

通常人们将t_ij记作数据矩阵 A，参数b_j记做参数向量b，观测值y_i记作Y，则线性方程组又可写成：

\begin{pmatrix}1 & t_{11} & \cdots & t_{1j} \cdots & t_{1q}\\1 & t_{21} & \cdots & t_{2j} \cdots & t_{2q}\\\vdots \\1 & t_{i1} & \cdots & t_{ij} \cdots & t_{iq}\\\vdots\\1 & t_{n1} & \cdots & t_{nj} \cdots & t_{nq}\end{pmatrix}\cdot\begin{pmatrix}b_0\\b_1\\b_2\\\vdots \\b_j\\\vdots\\b_q\end{pmatrix}=\begin{pmatrix}y_1\\y_2\\\vdots \\y_i\\\vdots\\y_n\end{pmatrix}

即

Ab = Y

上述方程运用最小二乘法导出为线性平方差计算的形式为：

\min_b\|Ab-Y\|_2

。

最小二乘法的解[编辑]

$\min_b \left \|\boldsymbol{Ab}- \boldsymbol{Y} \right \|_2,\boldsymbol{A}\in\mathbf{C}^{m\times n},\boldsymbol{Y}\in\mathbf{C}^{n}$

的特解为A的广义逆矩阵与Y的乘积，这同时也是二范数极小的解，其通解为特解加上A的零空间。证明如下：

先将Y拆成A的值域及其正交补两部分

\boldsymbol{Y}=\boldsymbol{Y}_{1}+\boldsymbol{Y}_{2}

\boldsymbol{Y}_{1}=\boldsymbol{A}\boldsymbol{A}^\dagger\boldsymbol{Y}\in R\left(\boldsymbol{A} \right)

\boldsymbol{Y}_{2}=\left(\boldsymbol{I}- \boldsymbol{A}\boldsymbol{A}^\dagger \right)\boldsymbol{Y}\in R\left(\boldsymbol{A} \right)^{\bot}

所以 $\boldsymbol{Ab}-\boldsymbol{Y}_{1}\in R\left(\boldsymbol{A} \right)$ ，可得

\left \| \boldsymbol{Ab}- \boldsymbol{Y} \right \|^{2}=\left \| \boldsymbol{Ab}- \boldsymbol{Y}_{1} +\left(-\boldsymbol{Y}_{2}\right) \right \|^{2}=\left \| \boldsymbol{Ab}- \boldsymbol{Y}_{1} \right \|^{2}+\left \|\boldsymbol{Y}_{2} \right \|^{2}

故当且仅当 $\boldsymbol{b}$ 是 $\boldsymbol{Ab}= \boldsymbol{Y}_{1} =\boldsymbol{A}\boldsymbol{A}^\dagger\boldsymbol{Y}$ 解时， $\boldsymbol{b}$ 即为最小二乘解，即 $\boldsymbol{b}=\boldsymbol{A}^\dagger \boldsymbol{Y} = {\left( {{{\mathbf{A}}^H}{\mathbf{A}}} \right)^{ - 1}}{{\mathbf{A}}^H}{\mathbf{Y}}$ 。

又因为

N\left(\boldsymbol{A}\right)=N\left(\boldsymbol{A}^\dagger \boldsymbol{A}\right)=R\left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A}\right)=\left\{ \left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h}:\boldsymbol{h}\in\mathbf{C}^{n} \right\}

故 $\boldsymbol{Ab}=\boldsymbol{A}\boldsymbol{A}^\dagger\boldsymbol{Y}$ 的通解为

\boldsymbol{b}=\boldsymbol{A}^\dagger\boldsymbol{Y}+\left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h}:\boldsymbol{h}\in\mathbf{C}^{n}

因为

\begin{align}\left \| \boldsymbol{A}^\dagger\boldsymbol{Y}\right \|^{2} & < \left \| \boldsymbol{A}^\dagger\boldsymbol{Y} \right \|^{2}+ \left \| \left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h} \right \|^{2} \\& = \left \| \boldsymbol{A}^\dagger\boldsymbol{Y} + \left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h} \right \|^{2},\left(\boldsymbol{I}-\boldsymbol{A}^\dagger \boldsymbol{A} \right) \boldsymbol{h}\neq\boldsymbol{0} \\\end{align}

所以 $\boldsymbol{A}^\dagger \boldsymbol{Y}$ 又是二范数极小的最小二乘解。

以上摘自WiKi百科

Age (years)	DBH (inch)
97	12.5
93	12.5
88	8.0
81	9.5
75	16.5
57	11.0
52	10.5
45	9.0
28	6.0
15	1.5
12	1.0
11	1.0

更多请看这里

Curve Fitting, Regression

Field data is often accompanied by noise. Even though all control parameters (independent variables) remain constant, the resultant outcomes (dependent variables) vary. A process of quantitatively estimating the trend of the outcomes, also known as regression or curve fitting, therefore becomes necessary.

The curve fitting process fits equations of approximating curves to the raw field data. Nevertheless, for a given set of data, the fitting curves of a given type are generally NOT unique. Thus, a curve with a minimal deviation from all data points is desired. This best-fitting curve can be obtained by the method of least squares.

The Method of Least Squares

The method of least squares assumes that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least square error) from a given set of data.

Suppose that the data points are (x1,y1) , (x2,y2) , ..., (xn,yn) where is the independent variable and is the dependent variable. The fitting curve f(x) has the deviation (error) from each data point, i.e., d1=y1-f(x1) , d2=y2-f(x2) , ..., dn=yn-f(xn) . According to the method of least squares, the best fitting curve has the property that:

Polynomials Least-Squares Fitting

Polynomials are one of the most commonly used types of curves in regression. The applications of the method of least squares curve fitting using polynomials are briefly discussed as follows. To obtain further information on a particular curve fitting, please click on the link at the end of each item. Or try the calculator on the right

The Least-Squares Line

The least-squares line uses a straight line

to approximate the given set of data, (x1,y1) , (x2,y2) , ..., (xn,yn) , where n>=2 . The best fitting curve f(x) has the least square error, i.e.,

Please note that and are unknown coefficients while all and are given. To obtain the least square error, the unknown coefficients and must yield zero first derivatives.

Expanding the above equations, we have:

The unknown coefficients and can therefore be obtained:

where Sigma stands for Sigma...i .

The Least-Squares Parabola

The least-squares parabola uses a second degree curve

Least-Squares Parabola Related Calculator

to approximate the given set of data, (x1,y1) , (x2,y2) , ..., (xn,yn) , where n>=3 . The best fitting curve f(x) has the least square error, i.e.,

Please note that , , and are unknown coefficients while all and are given. To obtain the least square error, the unknown coefficients , , and must yield zero first derivatives.

Expanding the above equations, we have

The unknown coefficients , , and can hence be obtained by solving the above linear equations.

The Least-Squares mth Degree Polynomials

When using an m^th degree polynomial

Least-Squares Polynomials Related Calculator

to approximate the given set of data, (x1,y1) , (x2,y2) , ..., (xn,yn) , where n>=3 , the best fitting curve f(x) has the least square error, i.e.,

Please note that , , , ..., and are unknown coefficients while all and are given. To obtain the least square error, the unknown coefficients , , , ..., and must yield zero first derivatives.

Expanding the above equations, we have

The unknown coefficients , , , ..., and can hence be obtained by solving the above linear equations.

Multiple Regression

Multiple regression estimates the outcomes (dependent variables) which may be affected by more than one control parameter (independent variables) or there may be more than one control parameter being changed at the same time.

An example is the two independent variables and and one dependent variable in the linear relationship case:

For a given data set (x1,y1,z1) , (x2,y2,z2) , ..., (xn,yn,zn) , where n>=3 , the best fitting curve f(x) has the least square error, i.e.,

Please note that , , and are unknown coefficients while all , , and are given. To obtain the least square error, the unknown coefficients , , and must yield zero first derivatives.

Expanding the above equations, we have

The unknown coefficients , , and can hence be obtained by solving the above linear equations.

也可以看到如下简单地理论：

Line of Best Fit(Least Square Method)

A line of best fit is a straight line that is the best approximation of the given set of data.

It is used to study the nature of the relation between two variables.

A line of best fit can be roughly determined using an eyeball method by drawing a straight line on a scatter plot so that the number of points above the line and below the line is about equal (and the line passes through as many points as possible).

A more accurate way of finding the line of best fit is the least square method .

Use the following steps to find the equation of line of best fit for a set of ordered pairs.

Step 1: Calculate the mean of the x-values and the mean of the y-values.

Step 2: Compute the sum of the squares of the x-values.

Step 3: Compute the sum of each x-value multiplied by its corresponding y-value.

Step 4: Calculate the slope of the line using the formula:

where n is the total number of data points.

Step 5: Compute the y-intercept of the line by using the formula:

where are the mean of the x- and y-coordinates of the data points respectively.

Step 6: Use the slope and the y -intercept to form the equation of the line.

Example:

Use the least square method to determine the equation of line of best fit for the data. Then plot the line.

Solution:

Plot the points on a coordinate plane.

Calculate the means of the x-values and the y-values, the sum of squares of the x-values, and the sum of each x-value multiplied by its corresponding y-value.

Calculate the slope.

Calculate the y-intercept.

First, calculate the mean of the x-values and that of the y-values.

Use the formula to compute the y-intercept.

Use the slope and y-intercept to form the equation of the line of best fit.

The slope of the line is –1.1 and the y -intercept is 14.0.

Therefore, the equation is y = –1.1 x + 14.0.

Draw the line on the scatter plot.

最小二乘法在解决机器学习中的回归分析

最小二乘法的简单具体实现：

最基本的：

#include<iostream>
#include<conio.h>
#include<math.h>
using namespace std;
int main()
{
	int n,i,j;
	float a,a0,a1,x[10],f[10],sumx=0,sumy=0,sumxy=0,sumx2=0;
	cout<<"Enter no of sample points ? ";cin>>n;
	cout<<"Enter all sample points: "<<endl;
	for(i=0;i<n;i++)
	{
		cin>>x[i]>>f[i]; // read both (x,f(x))
		sumx+=x[i];
		sumy+=f[i];
		sumxy+=x[i]*f[i];
		sumx2+=x[i]*x[i];
	}
	cout<<"your sample x ? ";
	cin>>a;
	a0=(sumy*sumx2-sumx*sumxy)/(n*sumx2-sumx*sumx);
	a1=(n*sumxy-sumx*sumy)/(n*sumx2-sumx*sumx);
	cout<<"The coefficients are : "<<endl<<a0<<endl<<a1;
	cout<<endl<<"f("<<a<<"): "<<(a0+a1*a);
	getch();
	return 0;
}

CvMat形式

#include<iostream>
#include <opencv2/opencv.hpp>
using namespace std;
using namespace cv;

void main()  
{  
	float   a[9]={1,2,3,4,5,7,6,8,9};  
	float   b[3]={2,3,1};  
	CvMat*   A   =   cvCreateMat(3,3,CV_32FC1);  
	CvMat*   X   =   cvCreateMat(3,1,CV_32FC1);  
	CvMat*   B   =   cvCreateMat(3,1,CV_32FC1);  

	cvSetData(A,a,CV_AUTOSTEP);  
	cvSetData(B,b,CV_AUTOSTEP);  
	cvSolve(A,B,X,CV_LU);   //   solve (Ax=b) for x  

	printf("A:");  
	for(int i=0;i < 9; i++)
	{  
		if(i%3==0) printf("\n");  
		printf("\t%f",A->data.fl[i]);  
	}  

	printf("\nX:\n");  
	for(int i = 0;i < 3; i++)
	{  
		printf("\t%f",X->data.fl[i]);  
	}  

	printf("\nb:\n");  
	for(int i=0;i<3;i++)
	{  
		printf("\t%f",B->data.fl[i]);  
	}  
	system("pause");
}

更详细的内容请看这里

Samwell-Tarly

关注

2
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
最小二乘法

最小二乘法（又称最小平方法）是一种数学优化技术。它通过最小化误差的平方和寻找数据的最佳函数匹配。利用最小二乘法可以简便地求得未知的数据，并使得这些求得的数据与实际数据之间误差的平方和为最小。最小二乘法还可用于曲线拟合。其他一些优化问题也可通过最小化能量或最大化熵用最小二乘法来表达。目录 [隐藏] 1 示例2 简介3 方法4 线性函数模型
复制链接

扫一扫