线性最小二乘

最新推荐文章于 2024-06-11 18:27:32 发布

narutojxl

最新推荐文章于 2024-06-11 18:27:32 发布

阅读量2.9k

点赞数 1

分类专栏：数学基础

数学基础专栏收录该内容

13 篇文章 1 订阅

订阅专栏

转载地址：https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)

最小二乘分为：线性最小二乘和非线性最小二乘

线性函数：如果把要求的n个未知变量记为 [x1,x2,x3,...xn]T，线性函数f(x1,x2,x3,...xn)= k1*x1+ k2*x2+k3*x3+...+kn*xn,写成矩阵的形式即为：

f(x1,x2,x3,...xn)=[k1,k2,k3,...kn]* [x1,x2,x3,...xn]T,正如线性优化中所说的：

Linear programming (LP) (also called linear optimization)

线性优化：目标函数和限制条件均为线性，即：

In linear algebra, a linear functional or linear form (also called a one-form or covector) is a linear map from a vector space to its field of scalars.In ℝn, if vectors are represented as column vectors, then linear functionals are represented as row vectors, and their action on vectors is given by the dot product, or the matrix product with the row vector on the left and the column vector on the right. In general, if V is a vector space over a field k, then a linear functional f is a function from V to k that is linear:

$f(\vec{v}+\vec{w}) = f(\vec{v})+f(\vec{w})$ for all $\vec{v}, \vec{w}\in V$

$f(a\vec{v}) = af(\vec{v})$ for all $\vec{v}\in V, a\in k.$

线性最小二乘又是什么意思呢？wiki给出的解释是：

$\sum _{j=1}^{n}X_{ij}\beta _{j}=y_{i},\ (i=1,2,\dots ,m),$

of m linear equations in n unknown coefficients, β1,β2,…,βn, with m > n. This can be written in matrix form as

$\mathbf {X} {\boldsymbol {\beta }}=\mathbf {y} ,$

where

$\mathbf {X} ={\begin{bmatrix}X_{11}&X_{12}&\cdots &X_{1n}\\X_{21}&X_{22}&\cdots &X_{2n}\\\vdots &\vdots &\ddots &\vdots \\X_{m1}&X_{m2}&\cdots &X_{mn}\end{bmatrix}},\qquad {\boldsymbol {\beta }}={\begin{bmatrix}\beta _{1}\\\beta _{2}\\\vdots \\\beta _{n}\end{bmatrix}},\qquad \mathbf {y} ={\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{m}\end{bmatrix}}.$

Such a system usually has no solution, so the goal is instead to find the coefficients ${\boldsymbol {\beta }}$ which fit the equations "best," in the sense of solving the quadratic minimization problem

${\hat {\boldsymbol {\beta }}}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,S({\boldsymbol {\beta }}),$

where the objective function S is given by

$S({\boldsymbol {\beta }})=\sum _{i=1}^{m}{\bigl |}y_{i}-\sum _{j=1}^{n}X_{ij}\beta _{j}{\bigr |}^{2}={\bigl \|}\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}{\bigr \|}^{2}.$

由此可以看到，如果一个线性最小二乘的目标函数可以写成上式这种形式：要求的参数 ${\boldsymbol {\beta }}$ 左成了一个矩阵，然后再与另一个向量求残差。上学那会好像在线性代数中说是，一个向量左乘矩阵就是对这个向量进行线性变换。

线性最小二乘的解

This minimization problem has a unique solution, provided that the n columns of the matrix $\mathbf {X}$ are linearly independent, given by solving the normal equations

$(\mathbf {X} ^{\rm {T}}\mathbf {X} ){\hat {\boldsymbol {\beta }}}=\mathbf {X} ^{\rm {T}}\mathbf {y} .$ 即 y-X ${\boldsymbol {\beta }}$ 正交于系数矩阵 X的每列： (X^T) *(y-X ${\boldsymbol {\beta }}$ ) =0

The matrix $\mathbf {X} ^{\rm {T}}\mathbf {X}$ is known as the Gramian matrix of $\mathbf {X}$ , which possesses several nice properties such as being a positive semi-definite matrix, and the matrix $\mathbf {X} ^{\rm {T}}\mathbf {y}$ is known as the moment matrix of regressand by regressors.[1] Finally, ${\hat {\boldsymbol {\beta }}}$ is the coefficient vector of the least-squares hyperplane, expressed as

${\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\rm {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\rm {T}}\mathbf {y} .$

当在矩阵X的n个列向量，线性无关的时候，目标函数的最优解是唯一的，为什么是这种形式呢？ wiki上给出了好几种推导：

<1> Derivation of the normal equations

目标函数对各个要求的未知参数求偏导，令各个偏导为0，然后写成矩阵的形式为： $(\mathbf {X} ^{\mathrm {T} }\mathbf {X} ){\hat {\boldsymbol {\beta }}}=\mathbf {X} ^{\mathrm {T} }\mathbf {y}$

<2> Derivation directly in terms of matrices

The normal equations can be derived directly from a matrix representation of the problem as follows. The objective is to minimize

$S({\boldsymbol {\beta }})={\bigl \|}\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}{\bigr \|}^{2}=(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})^{\rm {T}}(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})=\mathbf {y} ^{\rm {T}}\mathbf {y} -{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} -\mathbf {y} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}.$

Note that : $({\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} )^{\rm {T}}=\mathbf {y} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}$ has the dimension 1x1 , so it is a scalar and equal to its own transpose, hence ${\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} =\mathbf {y} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}$ and the quantity to minimize becomes

$S({\boldsymbol {\beta }})=\mathbf {y} ^{\rm {T}}\mathbf {y} -2{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {y} +{\boldsymbol {\beta }}^{\rm {T}}\mathbf {X} ^{\rm {T}}\mathbf {X} {\boldsymbol {\beta }}.$

Differentiating this with respect to ${\boldsymbol {\beta }}$ and equating to zero to satisfy the first-order conditions gives

$-\mathbf {X} ^{\rm {T}}\mathbf {y} +(\mathbf {X} ^{\rm {T}}\mathbf {X} ){\boldsymbol {\beta }}=0,$

疑问：上面以矩阵形式表示的目标函数，对要求的未知参数向量 ${\boldsymbol {\beta }}$ 求导，是属于 scalar-by-vector 求导类型，按照Identities 表可以自己推导一下。

可以看到在求 ${\boldsymbol {\beta }}$ 时，需要求矩阵的逆，当问题复杂时，直接求逆不现实，因此用矩阵分解的手段

<3>Orthogonal decomposition methods

Orthogonal decomposition methods of solving the least squares problem are slower than the normal equations method but are more numerically stablebecause they avoid forming the product XTX.

The residuals are written in matrix notation as

$\mathbf {r} =\mathbf {y} -X{\hat {\boldsymbol {\beta }}}.$

The matrix X is subjected to an orthogonal decomposition, e.g., the QR decomposition as follows.

$X=Q{\begin{pmatrix}R\\0\end{pmatrix}}\$ ,

where Q is an m×m orthogonal matrix (QTQ=I) and R is an n×n upper triangular matrix with $r_{ii}>0$ .

The residual vector is left-multiplied by QT.

$Q^{\rm {T}}\mathbf {r} =Q^{\rm {T}}\mathbf {y} -\left(Q^{\rm {T}}Q\right){\begin{pmatrix}R\\0\end{pmatrix}}{\hat {\boldsymbol {\beta }}}={\begin{bmatrix}\left(Q^{\rm {T}}\mathbf {y} \right)_{n}-R{\hat {\boldsymbol {\beta }}}\\\left(Q^{\rm {T}}\mathbf {y} \right)_{m-n}\end{bmatrix}}={\begin{bmatrix}\mathbf {u} \\\mathbf {v} \end{bmatrix}}$

Because Q is orthogonal, the sum of squares of the residuals, s, may be written as:

$s=\|\mathbf {r} \|^{2}=\mathbf {r} ^{\rm {T}}\mathbf {r} =\mathbf {r} ^{\rm {T}}QQ^{\rm {T}}\mathbf {r} =\mathbf {u} ^{\rm {T}}\mathbf {u} +\mathbf {v} ^{\rm {T}}\mathbf {v}$

Since v doesn't depend on β, the minimum value of s is attained when the upper block, u, is zero. Therefore the parameters are found by solving:

$R{\hat {\boldsymbol {\beta }}}=\left(Q^{\rm {T}}\mathbf {y} \right)_{n}.$

These equations are easily solved as R is upper triangular. 比起对一个 n*n 的 XT*X求逆，对于一个n*n的上三角阵R求逆，是我们喜闻乐见的

<4> 还可以X 进行 singular value decomposition (SVD)计算：

$X=U\Sigma V^{\rm {T}}\$

where U is m by m orthogonal matrix, V is n by n orthogonal matrix and $\Sigma$ is an m by n matrix with all its elements outside of the main diagonal equal to 0. The pseudoinverse of $\Sigma$ is easily obtained by inverting its non-zero diagonal elements and transposing. Hence,

$\mathbf {X} \mathbf {X} ^{+}=U\Sigma V^{\rm {T}}V\Sigma ^{+}U^{\rm {T}}=UPU^{\rm {T}},$

$S=\mathbf {X} ^{+}$ ,

and thus,

$\beta =V\Sigma ^{+}U^{\rm {T}}\mathbf {y}$

is a solution of a least squares problem.

<5>上面这些方法是用解析的方法求解线性最小二乘，还有一种方法是通过迭代的方法求解 Gauss-Seidel method

The Gauss–Seidel method is an iterative technique for solving a square system of n linear equations with unknown x:

$A\mathbf {x} =\mathbf {b}$ .

It is defined by the iteration

$L_{*}\mathbf {x} ^{(k+1)}=\mathbf {b} -U\mathbf {x} ^{(k)},$

where $\mathbf {x} ^{(k)}$ is the kth approximation or iteration of $\mathbf {x} ,\,\mathbf {x} ^{(k+1)}$ is the next or k + 1 iteration of $\mathbf {x}$ , and the matrix A is decomposed into a lower triangularcomponent $L_{*}$ , and a strictly upper triangular component U: $A=L_{*}+U$ .[2]

In more detail, write out A, x and b in their components:

$A={\begin{bmatrix}a_{11}&a_{12}&\cdots &a_{1n}\\a_{21}&a_{22}&\cdots &a_{2n}\\\vdots &\vdots &\ddots &\vdots \\a_{n1}&a_{n2}&\cdots &a_{nn}\end{bmatrix}},\qquad \mathbf {x} ={\begin{bmatrix}x_{1}\\x_{2}\\\vdots \\x_{n}\end{bmatrix}},\qquad \mathbf {b} ={\begin{bmatrix}b_{1}\\b_{2}\\\vdots \\b_{n}\end{bmatrix}}.$

Then the decomposition of A into its lower triangular component and its strictly upper triangular component is given by:

$A=L_{*}+U\qquad {\text{where}}\qquad L_{*}={\begin{bmatrix}a_{11}&0&\cdots &0\\a_{21}&a_{22}&\cdots &0\\\vdots &\vdots &\ddots &\vdots \\a_{n1}&a_{n2}&\cdots &a_{nn}\end{bmatrix}},\quad U={\begin{bmatrix}0&a_{12}&\cdots &a_{1n}\\0&0&\cdots &a_{2n}\\\vdots &\vdots &\ddots &\vdots \\0&0&\cdots &0\end{bmatrix}}.$

The system of linear equations may be rewritten as:

$L_{*}\mathbf {x} =\mathbf {b} -U\mathbf {x}$

The Gauss–Seidel method now solves the left hand side of this expression for x, using previous value for x on the right hand side. Analytically, this may be written as:

$\mathbf {x} ^{(k+1)}=L_{*}^{-1}(\mathbf {b} -U\mathbf {x} ^{(k)}).$

However, by taking advantage of the triangular form of $L_{*}$ , the elements of x(k+1) can be computed sequentially using forward substitution:

$x_{i}^{(k+1)}={\frac {1}{a_{ii}}}\left(b_{i}-\sum _{j=1}^{i-1}a_{ij}x_{j}^{(k+1)}-\sum _{j=i+1}^{n}a_{ij}x_{j}^{(k)}\right),\quad i=1,2,\dots ,n.$ [3]

The procedure is generally continued until the changes made by an iteration are below some tolerance, such as a sufficiently small residual.

Convergence

The convergence properties of the Gauss–Seidel method are dependent on the matrix A. Namely, the procedure is known to converge if either:

A is symmetric positive-definite,[4] or
A is strictly or irreducibly diagonally dominant.

The Gauss–Seidel method sometimes converges even if these conditions are not satisfied.

Weighted linear least squares(加权线性最小二乘)

In some cases the observations may be weighted—for example, they may not be equally reliable. In this case, one can minimize the weighted sum of squares:

${\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,\sum _{i=1}^{m}w_{i}\left|y_{i}-\sum _{j=1}^{n}X_{ij}\beta _{j}\right|^{2}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,{\big \|}W^{1/2}(\mathbf {y} -X{\boldsymbol {\beta }}){\big \|}^{2}.$

where wi > 0 is the weight of the ith observation, and W is the diagonal matrix of such weights.

The weights should, ideally, be equal to the reciprocal（倒数：即对于矩阵来说就是求逆，实数就是求倒数） of the variance of the measurement.[6][7] The normal equations are then:

$\left(X^{\rm {T}}WX\right){\hat {\boldsymbol {\beta }}}=X^{\rm {T}}W\mathbf {y} .$

This method is used in iteratively reweighted least squares.

非线性最小二乘

******************************************************************************************************************

对于非线性最小二乘问题，往往是通过迭代的方式计算，见wiki 高斯牛顿法

narutojxl

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
线性最小二乘

转载地址：https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)最小二乘分为：线性最小二乘和非线性最小二乘线性函数：如果把要求的n个未知变量记为 [x1,x2,x3,...xn]T，线性函数f(x1,x2,x3,...xn)= k1*x1+ k2*x2+k3*x3+...+kn*xn,写成矩阵的形式即为：...
复制链接

扫一扫