线性最小二乘

转载地址:https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)

最小二乘分为 : 线性最小二乘和非线性最小二乘

线性函数:  如果把要求的n个未知变量记为 [x1,x2,x3,...xn]T,线性函数f(x1,x2,x3,...xn)= k1*x1+ k2*x2+k3*x3+...+kn*xn,写成矩阵的形式即为:

f(x1,x2,x3,...xn)=[k1,k2,k3,...kn]* [x1,x2,x3,...xn]T,正如  线性优化中所说的:

Linear programming (LP) (also called linear optimization)

线性优化: 目标函数和限制条件均为线性,即:

In linear algebra, a linear functional or linear form (also called a one-form or covector) is a linear map from a vector space to its field of scalars.In n, if vectors are represented as column vectors, then linear functionals are represented as row vectors, and their action on vectors is given by the dot product, or the matrix product with the row vector on the left and the column vector on the right.  In general, if V is a vector space over a field k, then a linear functional f is a function from V to k that is linear:

f(\vec{v}+\vec{w}) = f(\vec{v})+f(\vec{w}) for all \vec{v}, \vec{w}\in V

f(a\vec{v}) = af(\vec{v}) for all \vec{v}\in V, a\in k.

 

线性最小二乘又是什么意思呢?wiki给出的解释是:

\sum _{j=1}^{n}X_{ij}\beta _{j}=y_{i},\ (i=1,2,\dots ,m),

of m linear equations in n unknown coefficientsβ1,β2,…,βn, with m > n. This can be written in matrix form as

\mathbf {X} {\boldsymbol {\beta }}=\mathbf {y} ,

where

{\displaystyle \mathbf {X} ={\begin{bmatrix}X_{11}&X_{12}&\cdots &X_{1n}\\X_{21}&X_{22}&\cdots &X_{2n}\\\vdots &\vdots &\ddots &\vdots \\X_{m1}&X_{m2}&\cdots &X_{mn}\end{bmatrix}},\qquad {\boldsymbol {\beta }}={\begin{bmatrix}\beta _{1}\\\beta _{2}\\\vdots \\\beta _{n}\end{bmatrix}},\qquad \mathbf {y} ={\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{m}\end{bmatrix}}.}

Such a system usually has no solution, so the goal is instead to find the coefficients {\boldsymbol {\beta }} which fit the equations "best," in the sense of solving the quadratic minimization problem

{\hat {\boldsymbol {\beta }}}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,S({\boldsymbol {\beta }}),

where the objective function S is given by

{\displaystyle S({\boldsymbol {\beta }})=\sum _{i=1}^{m}{\bigl |}y_{i}-\sum _{j=1}^{n}X_{ij}\beta _{j}{\bigr |}^{2}={\bigl \|}\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}{\bigr \|}^{2}.}

由此可以看到,如果一个线性最小二乘的目标函数可以写成上式这种形式:要求的参数{\boldsymbol {\beta }}左成了一个矩阵,然后再与另一个向量求残差。上学那会好像在线性代数中说是,一个向量左乘矩阵就是对这个向量进行线性变换。

 

线性最小二乘的解

This minimization problem has a unique solution, provided that the n columns of the matrix \mathbf {X} are linearly independent, given by solving the normal equations

(\mathbf {X} ^{\rm {T}}\mathbf {X} ){\hat {\boldsymbol {\beta }}}=\mathbf {X} ^{\rm {T}}\mathbf {y} .  即  y-X{\boldsymbol {\beta }}  正交于系数矩阵 X的每列:   (X^T) *(y-X{\boldsymbol {\beta }}) =0   

The matrix \mathbf {X} ^{\rm {T}}\mathbf {X} is known as the Gramian matrix of \mathbf {X}, which possesses several nice properties such as being a positive semi-definite matrix, and the matrix {\displaystyle \mathbf {X} ^{\rm {T}}\mathbf {y} } is known as the moment matrix of regressand by regressors.[1] Finally, {\hat {\boldsymbol {\beta }}} is the coefficient vector of the least-squares hyperplane, expressed as

{\displaystyle {\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\rm {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\rm {T}}\mathbf {y} .}

 

当在矩阵X的n个列向量,线性无关的时候,目标函数的最优解是唯一的,为什么是这种形式呢? wiki上给出了好几种推导:

<1>  Derivation of the normal equations

 目标函数对各个要求的未知参数求偏导,令各个偏导为0,然后写成矩阵的形式为:(\mathbf {X} ^{\mathrm {T} }\mathbf {X} ){\hat {\boldsymbol {\beta }}}=\mathbf {X} ^{\mathrm {T} }\mathbf {y}

 

<2> Derivation directly in terms of matrices

The normal equations can be derived directly from a matrix representation of the problem as follows. The objective is to minimize

S({\boldsymbol {\beta }})={\bigl \|}\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}{\bigr \|}^{2}=(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})^{\rm {T}}(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})=\mathbf {y} ^{\rm {T}}\mathbf {y} -{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} -\mathbf {y} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}.

Note that  :({\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} )^{\rm {T}}=\mathbf {y} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }} has the dimension 1x1 , so it is a scalar and equal to its own transpose, hence {\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} =\mathbf {y} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }} and the quantity to minimize becomes

S({\boldsymbol {\beta }})=\mathbf {y} ^{\rm {T}}\mathbf {y} -2{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} +{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}.

Differentiating this with respect to {\boldsymbol {\beta }} and equating to zero to satisfy the first-order conditions gives

-\mathbf {X} ^{\rm {T}}\mathbf {y} +(\mathbf {X} ^{\rm {T}}\mathbf {X} ){\boldsymbol {\beta }}=0,

疑问:上面   以矩阵形式表示的目标函数,对要求的未知参数向量{\boldsymbol {\beta }}求导,是属于 scalar-by-vector 求导类型 ,按照Identities 表可以自己推导一下。

可以看到 在求{\boldsymbol {\beta }}时,需要求矩阵的逆,当问题复杂时,直接求逆不现实,因此用矩阵分解的手段

 

<3>Orthogonal decomposition methods

Orthogonal decomposition methods of solving the least squares problem are slower than the normal equations method but are more numerically stablebecause they avoid forming the product XTX.

The residuals are written in matrix notation as

\mathbf {r} =\mathbf {y} -X{\hat {\boldsymbol {\beta }}}.

The matrix X is subjected to an orthogonal decomposition, e.g., the QR decomposition as follows.

{\displaystyle X=Q{\begin{pmatrix}R\\0\end{pmatrix}}\ },

where Q is an m×m orthogonal matrix (QTQ=I) and R is an n×n upper triangular matrix with r_{ii}>0.

The residual vector is left-multiplied by QT.

{\displaystyle Q^{\rm {T}}\mathbf {r} =Q^{\rm {T}}\mathbf {y} -\left(Q^{\rm {T}}Q\right){\begin{pmatrix}R\\0\end{pmatrix}}{\hat {\boldsymbol {\beta }}}={\begin{bmatrix}\left(Q^{\rm {T}}\mathbf {y} \right)_{n}-R{\hat {\boldsymbol {\beta }}}\\\left(Q^{\rm {T}}\mathbf {y} \right)_{m-n}\end{bmatrix}}={\begin{bmatrix}\mathbf {u} \\\mathbf {v} \end{bmatrix}}}

Because Q is orthogonal, the sum of squares of the residuals, s, may be written as:

s=\|\mathbf {r} \|^{2}=\mathbf {r} ^{\rm {T}}\mathbf {r} =\mathbf {r} ^{\rm {T}}QQ^{\rm {T}}\mathbf {r} =\mathbf {u} ^{\rm {T}}\mathbf {u} +\mathbf {v} ^{\rm {T}}\mathbf {v}

Since v doesn't depend on β, the minimum value of s is attained when the upper block, u, is zero. Therefore the parameters are found by solving:

R{\hat {\boldsymbol {\beta }}}=\left(Q^{\rm {T}}\mathbf {y} \right)_{n}.

These equations are easily solved as R is upper triangular. 比起对一个 n*n 的  XT*X求逆,对于一个n*n的上三角阵R求逆,是我们喜闻乐见的

 

<4> 还可以X 进行  singular value decomposition (SVD)计算:

 

X=U\Sigma V^{\rm {T}}\

where U is m by m orthogonal matrix, V is n by n orthogonal matrix and \Sigma is an m by n matrix with all its elements outside of the main diagonal equal to 0. The pseudoinverse of \Sigma is easily obtained by inverting its non-zero diagonal elements and transposing. Hence,

\mathbf {X} \mathbf {X} ^{+}=U\Sigma V^{\rm {T}}V\Sigma ^{+}U^{\rm {T}}=UPU^{\rm {T}},

S=\mathbf {X} ^{+},

and thus,

\beta =V\Sigma ^{+}U^{\rm {T}}\mathbf {y}

is a solution of a least squares problem.

 

 

<5>上面这些方法是用解析的方法求解线性最小二乘,还有一种方法是通过迭代的方法求解 Gauss-Seidel method 

 

The Gauss–Seidel method is an iterative technique for solving a square system of n linear equations with unknown x:

A\mathbf {x} =\mathbf {b}.

It is defined by the iteration

L_{*}\mathbf {x} ^{(k+1)}=\mathbf {b} -U\mathbf {x} ^{(k)},

where \mathbf {x} ^{(k)} is the kth approximation or iteration of {\displaystyle \mathbf {x} ,\,\mathbf {x} ^{(k+1)}} is the next or k + 1 iteration of \mathbf {x}, and the matrix A is decomposed into a lower triangularcomponent L_{*}, and a strictly upper triangular component UA=L_{*}+U.[2]

In more detail, write out Ax and b in their components:

A={\begin{bmatrix}a_{11}&a_{12}&\cdots &a_{1n}\\a_{21}&a_{22}&\cdots &a_{2n}\\\vdots &\vdots &\ddots &\vdots \\a_{n1}&a_{n2}&\cdots &a_{nn}\end{bmatrix}},\qquad \mathbf {x} ={\begin{bmatrix}x_{1}\\x_{2}\\\vdots \\x_{n}\end{bmatrix}},\qquad \mathbf {b} ={\begin{bmatrix}b_{1}\\b_{2}\\\vdots \\b_{n}\end{bmatrix}}.

Then the decomposition of A into its lower triangular component and its strictly upper triangular component is given by:

A=L_{*}+U\qquad {\text{where}}\qquad L_{*}={\begin{bmatrix}a_{11}&0&\cdots &0\\a_{21}&a_{22}&\cdots &0\\\vdots &\vdots &\ddots &\vdots \\a_{n1}&a_{n2}&\cdots &a_{nn}\end{bmatrix}},\quad U={\begin{bmatrix}0&a_{12}&\cdots &a_{1n}\\0&0&\cdots &a_{2n}\\\vdots &\vdots &\ddots &\vdots \\0&0&\cdots &0\end{bmatrix}}.

The system of linear equations may be rewritten as:

L_{*}\mathbf {x} =\mathbf {b} -U\mathbf {x}

The Gauss–Seidel method now solves the left hand side of this expression for x, using previous value for x on the right hand side. Analytically, this may be written as:

\mathbf {x} ^{(k+1)}=L_{*}^{-1}(\mathbf {b} -U\mathbf {x} ^{(k)}).

However, by taking advantage of the triangular form of L_{*}, the elements of x(k+1) can be computed sequentially using forward substitution:

{\displaystyle x_{i}^{(k+1)}={\frac {1}{a_{ii}}}\left(b_{i}-\sum _{j=1}^{i-1}a_{ij}x_{j}^{(k+1)}-\sum _{j=i+1}^{n}a_{ij}x_{j}^{(k)}\right),\quad i=1,2,\dots ,n.} [3]

The procedure is generally continued until the changes made by an iteration are below some tolerance, such as a sufficiently small residual.

Convergence

The convergence properties of the Gauss–Seidel method are dependent on the matrix A. Namely, the procedure is known to converge if either:

The Gauss–Seidel method sometimes converges even if these conditions are not satisfied.

 

 

Weighted linear least squares(加权线性最小二乘)

In some cases the observations may be weighted—for example, they may not be equally reliable. In this case, one can minimize the weighted sum of squares:

{\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,\sum _{i=1}^{m}w_{i}\left|y_{i}-\sum _{j=1}^{n}X_{ij}\beta _{j}\right|^{2}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,{\big \|}W^{1/2}(\mathbf {y} -X{\boldsymbol {\beta }}){\big \|}^{2}.

where wi > 0 is the weight of the ith observation, and W is the diagonal matrix of such weights.

The weights should, ideally, be equal to the reciprocal(倒数:即对于矩阵来说就是求逆,实数就是求倒数) of the variance of the measurement.[6][7] The normal equations are then:

\left(X^{\rm {T}}WX\right){\hat {\boldsymbol {\beta }}}=X^{\rm {T}}W\mathbf {y} .

This method is used in iteratively reweighted least squares.

 

非线性最小二乘

 

******************************************************************************************************************

对于非线性最小二乘问题, 往往是通过迭代的方式计算,见wiki 高斯牛顿法

 

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值