转载地址:https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)
最小二乘分为 : 线性最小二乘和非线性最小二乘
线性函数: 如果把要求的n个未知变量记为 [x1,x2,x3,...xn]T,线性函数f(x1,x2,x3,...xn)= k1*x1+ k2*x2+k3*x3+...+kn*xn,写成矩阵的形式即为:
f(x1,x2,x3,...xn)=[k1,k2,k3,...kn]* [x1,x2,x3,...xn]T,正如 线性优化中所说的:
Linear programming (LP) (also called linear optimization)
线性优化: 目标函数和限制条件均为线性,即:
In linear algebra, a linear functional or linear form (also called a one-form or covector) is a linear map from a vector space to its field of scalars.In ℝn, if vectors are represented as column vectors, then linear functionals are represented as row vectors, and their action on vectors is given by the dot product, or the matrix product with the row vector on the left and the column vector on the right. In general, if V is a vector space over a field k, then a linear functional f is a function from V to k that is linear:
for all
for all
线性最小二乘又是什么意思呢?wiki给出的解释是:
of m linear equations in n unknown coefficients, β1,β2,…,βn, with m > n. This can be written in matrix form as
where
Such a system usually has no solution, so the goal is instead to find the coefficients which fit the equations "best," in the sense of solving the quadratic minimization problem
where the objective function S is given by
由此可以看到,如果一个线性最小二乘的目标函数可以写成上式这种形式:要求的参数左成了一个矩阵,然后再与另一个向量求残差。上学那会好像在线性代数中说是,一个向量左乘矩阵就是对这个向量进行线性变换。
线性最小二乘的解
This minimization problem has a unique solution, provided that the n columns of the matrix are linearly independent, given by solving the normal equations
即 y-X 正交于系数矩阵 X的每列: (X^T) *(y-X) =0
The matrix is known as the Gramian matrix of , which possesses several nice properties such as being a positive semi-definite matrix, and the matrix is known as the moment matrix of regressand by regressors.[1] Finally, is the coefficient vector of the least-squares hyperplane, expressed as
当在矩阵X的n个列向量,线性无关的时候,目标函数的最优解是唯一的,为什么是这种形式呢? wiki上给出了好几种推导:
<1> Derivation of the normal equations
目标函数对各个要求的未知参数求偏导,令各个偏导为0,然后写成矩阵的形式为:
<2> Derivation directly in terms of matrices
The normal equations can be derived directly from a matrix representation of the problem as follows. The objective is to minimize
Note that : has the dimension 1x1 , so it is a scalar and equal to its own transpose, hence and the quantity to minimize becomes
Differentiating this with respect to and equating to zero to satisfy the first-order conditions gives
疑问:上面 以矩阵形式表示的目标函数,对要求的未知参数向量求导,是属于 scalar-by-vector 求导类型 ,按照Identities 表可以自己推导一下。
可以看到 在求时,需要求矩阵的逆,当问题复杂时,直接求逆不现实,因此用矩阵分解的手段
<3>Orthogonal decomposition methods
Orthogonal decomposition methods of solving the least squares problem are slower than the normal equations method but are more numerically stablebecause they avoid forming the product XTX.
The residuals are written in matrix notation as
The matrix X is subjected to an orthogonal decomposition, e.g., the QR decomposition as follows.
,
where Q is an m×m orthogonal matrix (QTQ=I) and R is an n×n upper triangular matrix with .
The residual vector is left-multiplied by QT.
Because Q is orthogonal, the sum of squares of the residuals, s, may be written as:
Since v doesn't depend on β, the minimum value of s is attained when the upper block, u, is zero. Therefore the parameters are found by solving:
These equations are easily solved as R is upper triangular. 比起对一个 n*n 的 XT*X求逆,对于一个n*n的上三角阵R求逆,是我们喜闻乐见的
<4> 还可以X 进行 singular value decomposition (SVD)计算:
where U is m by m orthogonal matrix, V is n by n orthogonal matrix and is an m by n matrix with all its elements outside of the main diagonal equal to 0. The pseudoinverse of is easily obtained by inverting its non-zero diagonal elements and transposing. Hence,
,
and thus,
is a solution of a least squares problem.
<5>上面这些方法是用解析的方法求解线性最小二乘,还有一种方法是通过迭代的方法求解 Gauss-Seidel method
The Gauss–Seidel method is an iterative technique for solving a square system of n linear equations with unknown x:
.
It is defined by the iteration
where is the kth approximation or iteration of is the next or k + 1 iteration of , and the matrix A is decomposed into a lower triangularcomponent , and a strictly upper triangular component U: .[2]
In more detail, write out A, x and b in their components:
Then the decomposition of A into its lower triangular component and its strictly upper triangular component is given by:
The system of linear equations may be rewritten as:
The Gauss–Seidel method now solves the left hand side of this expression for x, using previous value for x on the right hand side. Analytically, this may be written as:
However, by taking advantage of the triangular form of , the elements of x(k+1) can be computed sequentially using forward substitution:
The procedure is generally continued until the changes made by an iteration are below some tolerance, such as a sufficiently small residual.
Convergence
The convergence properties of the Gauss–Seidel method are dependent on the matrix A. Namely, the procedure is known to converge if either:
- A is symmetric positive-definite,[4] or
- A is strictly or irreducibly diagonally dominant.
The Gauss–Seidel method sometimes converges even if these conditions are not satisfied.
Weighted linear least squares(加权线性最小二乘)
In some cases the observations may be weighted—for example, they may not be equally reliable. In this case, one can minimize the weighted sum of squares:
where wi > 0 is the weight of the ith observation, and W is the diagonal matrix of such weights.
The weights should, ideally, be equal to the reciprocal(倒数:即对于矩阵来说就是求逆,实数就是求倒数) of the variance of the measurement.[6][7] The normal equations are then:
This method is used in iteratively reweighted least squares.
非线性最小二乘
******************************************************************************************************************
对于非线性最小二乘问题, 往往是通过迭代的方式计算,见wiki 高斯牛顿法