16.投影矩阵,最小二乘
-
证明两个极端情况
证明当 b ⃗ \vec{b} b在 A A A的列空间里时,它的投影为它本身:
设 b ⃗ = A x ⃗ \vec{b} = A \vec{x} b=Ax,则 P b ⃗ = A ( A T A ) − 1 A T A x ⃗ = A x ⃗ = b ⃗ P \vec{b} = A (A^T A)^{-1} A^T A \vec{x} = A \vec{x} = \vec{b} Pb=A(ATA)−1ATAx=Ax=b
证明当 b ⃗ \vec{b} b与 A A A的列空间正交时,它的投影为 0 ⃗ \vec{0} 0:
因为 b ⃗ \vec{b} b与 A A A的每一列都正交,所以 A T b ⃗ = 0 ⃗ A^{T} \vec{b} = \vec{0} ATb=0,所以 P b ⃗ = A ( A T A ) − 1 A T b ⃗ = 0 ⃗ P \vec{b} = A (A^{T} A)^{-1} A^{T} \vec{b} = \vec{0} Pb=A(ATA)−1ATb=0
-
当使用 P P P将 b ⃗ \vec{b} b投影到某个空间中得到 p ⃗ \vec{p} p时, b ⃗ \vec{b} b分解出的另一个向量 e ⃗ \vec{e} e即为 b ⃗ \vec{b} b在该空间的任意正交补中的投影,此时那个正交补的投影矩阵即为 I − P I - P I−P,由此可以得到两个互为正交补的空间的投影矩阵之和为 I I I
-
最小二乘法
点为 ( x 1 , y 1 ) , ⋯ , ( x n , y n ) (x_{1} , y_{1}) , \cdots , (x_{n} , y_{n}) (x1,y1),⋯,(xn,yn),设拟合直线为 y ^ = b ^ x + a ^ \widehat{y} = \widehat{b} x + \widehat{a} y =b x+a
有 A = [ x 1 1 ⋮ ⋮ x n 1 ] , x ⃗ ^ = [ b ^ a ^ ] , b ⃗ = [ y 1 ⋮ y n ] A = \begin{bmatrix} x_{1} & 1 \\ \vdots & \vdots \\ x_{n} & 1 \end{bmatrix} , \widehat{\vec{x}} = \begin{bmatrix} \widehat{b} \\ \widehat{a} \end{bmatrix} , \vec{b} = \begin{bmatrix} y_{1} \\ \vdots \\ y_{n} \end{bmatrix} A= x1⋮xn1⋮1 ,x =[b a ],b= y1⋮yn
要使总误差最小,需要在 A A A的列空间中找到一个 b ⃗ ′ \vec{b}^{'} b′使得 ∣ b ⃗ − b ⃗ ′ ∣ |\vec{b} - \vec{b}^{'}| ∣b−b′∣最小(也就是使 ( y 1 − y 1 ^ ) 2 + ⋯ + ( y n − y n ^ ) 2 (y_1 - \widehat{y_1})^2 + \cdots + (y_{n} - \widehat{y_n})^2 (y1−y1 )2+⋯+(yn−yn )2最小)
证明 b ⃗ ′ = p ⃗ \vec{b}^{'} = \vec{p} b′=p时误差最小:
此时误差为 ∣ e ⃗ ∣ |\vec{e}| ∣e∣
若选取除 p ⃗ \vec{p} p以外的 A A A的列空间中的向量,设其为 b ⃗ ′ = p ⃗ + a ⃗ \vec{b}^{'} = \vec{p} + \vec{a} b′=p+a,那么 b ⃗ − b ⃗ ′ = e ⃗ − a ⃗ \vec{b} - \vec{b}^{'} = \vec{e} - \vec{a} b−b′=e−a
已知 e ⃗ \vec{e} e垂直于 A A A的列空间中的任意向量,又 a ⃗ \vec{a} a属于 A A A的列空间,所以 e ⃗ \vec{e} e与 a ⃗ \vec{a} a垂直
因而 ∣ e ⃗ − a ⃗ ∣ 2 = e ⃗ 2 + a ⃗ 2 > e ⃗ 2 |\vec{e} - \vec{a}|^2 = \vec{e}^2 + \vec{a}^2 > \vec{e}^2 ∣e−a∣2=e2+a2>e2,即 ∣ e ⃗ − a ⃗ ∣ > ∣ e ⃗ ∣ |\vec{e} - \vec{a}| > |\vec{e}| ∣e−a∣>∣e∣
所以选取除 p ⃗ \vec{p} p以外的 A A A的列空间中的向量都会导致误差增大
有 p ⃗ = ( y 1 ^ , ⋯ , y n ^ ) , e ⃗ = ( y 1 − y 1 ^ , ⋯ , y n − y n ^ ) \vec{p} = (\widehat{y_1} , \cdots , \widehat{y_n}) , \vec{e} = (y_1 - \widehat{y_1} , \cdots , y_n - \widehat{y_n}) p=(y1 ,⋯,yn ),e=(y1−y1 ,⋯,yn−yn )
推导最小二乘法的公式:
法一: ∵ P = A ( A T A ) − 1 A T \because P = A (A^T A)^{-1} A^T ∵P=A(ATA)−1AT
∴ p ⃗ = [ x 1 1 ⋮ ⋮ x n 1 ] ( [ x 1 ⋯ x n 1 ⋯ 1 ] [ x 1 1 ⋮ ⋮ x n 1 ] ) − 1 [ x 1 ⋯ x n 1 ⋯ 1 ] [ y 1 ⋮ y n ] = [ x 1 1 ⋮ ⋮ x n 1 ] [ ∑ x i 2 n x ‾ n x ‾ n ] − 1 [ ∑ x i y i n y ‾ ] = [ x 1 1 ⋮ ⋮ x n 1 ] [ 1 t − n x ‾ n t − x ‾ t ∑ x i 2 n t ] [ ∑ x i y i n y ‾ ] = [ n ( x 1 − x ‾ ) n t ∑ x i 2 − n x ‾ x 1 n t ⋮ ⋮ n ( x n − x ‾ ) n t ∑ x i 2 − n x ‾ x n n t ] [ ∑ x i y i n y ‾ ] = [ 1 t ( ∑ x i y i ( x 1 − x ‾ ) + y ‾ ( ∑ x i 2 − n x ‾ x 1 ) ) ⋮ 1 t ( ∑ x i y i ( x n − x ‾ ) + y ‾ ( ∑ x i 2 − n x ‾ x n ) ) ] \begin{aligned} \therefore \vec{p} & = \begin{bmatrix} x_1 & 1 \\ \vdots & \vdots \\ x_n & 1 \end{bmatrix} (\begin{bmatrix} x_1 & \cdots & x_n \\ 1 & \cdots & 1 \end{bmatrix} \begin{bmatrix} x_1 & 1 \\ \vdots & \vdots \\ x_n & 1 \end{bmatrix})^{-1} \begin{bmatrix} x_1 & \cdots & x_n \\ 1 & \cdots & 1 \end{bmatrix} \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} \\ & = \begin{bmatrix} x_1 & 1 \\ \vdots & \vdots \\ x_n & 1 \end{bmatrix} \begin{bmatrix} \sum{x_i^2} & n \overline{x} \\ n \overline{x} & n \end{bmatrix}^{-1} \begin{bmatrix} \sum{x_i y_i} \\ n \overline{y} \end{bmatrix} \\ & = \begin{bmatrix} x_1 & 1 \\ \vdots & \vdots \\ x_n & 1 \end{bmatrix} \begin{bmatrix} \dfrac{1}{t} & \dfrac{-n \overline{x}}{nt} \\ \dfrac{-\overline{x}}{t} & \dfrac{\sum{x_i^2}}{nt} \end{bmatrix} \begin{bmatrix} \sum{x_i y_i} \\ n \overline{y} \end{bmatrix} \\ & = \begin{bmatrix} \dfrac{n (x_1 - \overline{x})}{nt} & \dfrac{\sum{x_i^2} - n \overline{x} x_1}{nt} \\ \vdots & \vdots \\ \dfrac{n (x_n - \overline{x})}{nt} & \dfrac{\sum{x_i^2} - n \overline{x} x_n}{nt} \end{bmatrix} \begin{bmatrix} \sum{x_i y_i} \\ n \overline{y} \end{bmatrix} \\ & = \begin{bmatrix} \dfrac{1}{t} (\sum{x_i y_i} (x_1 - \overline{x}) + \overline{y} (\sum{x_i^2} - n \overline{x} x_1)) \\ \vdots \\ \dfrac{1}{t} (\sum{x_i y_i} (x_{n} - \overline{x}) + \overline{y} (\sum{x_i^2} - n \overline{x} x_n)) \end{bmatrix} \end{aligned} ∴p= x1⋮xn1⋮1 ([x11⋯⋯xn1] x1⋮xn1⋮1 )−1[x11⋯⋯xn1] y1⋮yn = x1⋮xn1⋮1 [∑xi2nxnxn]−1[∑xiyiny]= x1⋮xn1⋮1 t1t−xnt−nxnt∑xi2 [∑xiyiny]= ntn(x1−x)⋮ntn(xn−x)nt∑xi2−nxx1⋮nt∑xi2−nxxn [∑xiyiny]= t1(∑xiyi(x1−x)+y(∑xi2−nxx1))⋮t1(∑xiyi(xn−x)+y(∑xi2−nxxn))
其中, t = ∑ x i 2 − n x ‾ 2 t = \sum{x_i^2 - n \overline{x}^2} t=∑xi2−nx2
代入 ( x 1 , y 1 ) , ( x 2 , y 2 ) (x_1 , y_1) , (x_2 , y_2) (x1,y1),(x2,y2)得:
{ b ^ x 1 + a = 1 t ( ∑ x i y i ( x 1 − x ‾ ) + y ‾ ( ∑ x i 2 − n x ‾ x 1 ) b ^ x 2 + a = 1 t ( ∑ x i y i ( x 2 − x ‾ ) + y ‾ ( ∑ x i 2 − n x ‾ x 2 ) ) \left\{\begin{matrix} \widehat{b} x_1 + a = \dfrac{1}{t} (\sum{x_i y_i} (x_1 - \overline{x}) + \overline{y} (\sum{x_i^2} - n \overline{x} x_1) \\ \widehat{b} x_2 +a = \dfrac{1}{t} (\sum{x_i y_i} (x_2 - \overline{x}) + \overline{y} (\sum{x_i^2} - n \overline{x} x_2)) \end{matrix}\right. ⎩ ⎨ ⎧b x1+a=t1(∑xiyi(x1−x)+y(∑xi2−nxx1)b x2+a=t1(∑xiyi(x2−x)+y(∑xi2−nxx2))
解得: { b ^ = 1 t ( ∑ x i y i − n x ‾ y ‾ ) = ∑ x i y i − n x ‾ y ‾ ∑ x i 2 − n x ‾ 2 a ^ = y ‾ − b ^ x ‾ \left\{\begin{matrix} \widehat{b} = \dfrac{1}{t} (\sum{x_i y_i} - n \overline{x} \overline{y}) = \dfrac{\sum{x_i y_i} - n \overline{x} \overline{y}}{\sum x_i^2 - n \overline{x}^2} \\ \widehat{a} = \overline{y} - \widehat{b} \overline{x} \end{matrix}\right. ⎩ ⎨ ⎧b =t1(∑xiyi−nxy)=∑xi2−nx2∑xiyi−nxya =y−b x
法二: 求拟合直线时,由上一讲可得 A T b ⃗ = A T A x ⃗ A^T \vec{b} = A^T A \vec{x} ATb=ATAx
可以考虑这么计算: A T [ A ∣ b ⃗ ] = [ A T A ∣ A T b ⃗ ] A^T \begin{bmatrix} A | \vec{b} \end{bmatrix} = \begin{bmatrix} A^T A | A^T \vec{b} \end{bmatrix} AT[A∣b]=[ATA∣ATb],再使用 A T b ⃗ = A T A x ⃗ A^T \vec{b} = A^T A \vec{x} ATb=ATAx列出方程组得到 x ⃗ \vec{x} x
有 A T b ⃗ = [ ∑ x i y i n y ‾ ] , A T A = [ ∑ x i 2 n x ‾ n x ‾ n ] A^T \vec{b} = \begin{bmatrix} \sum{x_{i} y_{i}} \\ n \overline{y} \end{bmatrix} , A^T A = \begin{bmatrix} \sum x_i^2 & n \overline{x} \\ n \overline{x} & n \end{bmatrix} ATb=[∑xiyiny],ATA=[∑xi2nxnxn]
所以可以列出的方程组为: { ∑ x i 2 b ^ + n x ‾ a ^ = ∑ x i y i n x ‾ b ^ + n a ^ = n y ‾ \left\{\begin{matrix} \sum x_i^2 \widehat{b} + n \overline{x} \widehat{a} = \sum{x_i y_i} \\ n \overline{x} \widehat{b} + n \widehat{a} = n \overline{y} \end{matrix}\right. {∑xi2b +nxa =∑xiyinxb +na =ny,解得: { b ^ = ∑ x i y i − n x ‾ y ‾ ∑ x i 2 − n x ‾ 2 a ^ = y ‾ − b ^ x ‾ \left\{\begin{matrix} \widehat{b} = \dfrac{\sum{x_i y_i} - n \overline{x} \overline{y}}{\sum x_i^2 - n \overline{x}^2} \\ \widehat{a} = \overline{y} - \widehat{b} \overline{x} \end{matrix}\right. ⎩ ⎨ ⎧b =∑xi2−nx2∑xiyi−nxya =y−b x
例: 求 ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 2 ) (1,1) , (2,2) , (3,2) (1,1),(2,2),(3,2)三点的一条拟合直线
设直线为 y ^ = b ^ x + a ^ \widehat{y} = \widehat{b} x + \widehat{a} y =b x+a
有 { b ^ + a ^ = 1 2 b ^ + a ^ = 2 3 b ^ + a ^ = 2 \left\{\begin{matrix} \widehat{b} + \widehat{a} = 1 \\ 2\widehat{b} + \widehat{a} = 2 \\ 3\widehat{b} + \widehat{a} = 2 \end{matrix}\right. ⎩ ⎨ ⎧b +a =12b +a =23b +a =2,即 [ 1 1 2 1 3 1 ] [ b ^ a ^ ] = [ 1 2 2 ] A x ⃗ ^ b ⃗ \begin{matrix} \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix} & \begin{bmatrix} \widehat{b} \\ \widehat{a} \end{bmatrix} & = & \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} \\ A & \widehat{\vec{x}} & & \vec{b} \end{matrix} 123111 A[b a ]x = 122 b
易得 b ⃗ \vec{b} b不属于 A A A的列空间
由 [ 1 2 3 1 1 1 ] [ 1 1 ∣ 1 2 1 ∣ 2 3 1 ∣ 2 ] = [ 14 6 ∣ 11 6 3 ∣ 5 ] A T [ A ∣ b ⃗ ] [ A T A ∣ A T b ⃗ ] \begin{matrix} \begin{bmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{bmatrix} & \begin{bmatrix} 1 & 1 & | & 1 \\ 2 & 1 & | & 2 \\ 3 & 1 & | & 2 \end{bmatrix} & = & \begin{bmatrix} 14 & 6 & | & 11 \\ 6 & 3 & | & 5 \end{bmatrix} \\ A^T & \begin{bmatrix} A | \vec{b} \end{bmatrix} & & \begin{bmatrix} A^T A | A^T \vec{b} \end{bmatrix} \end{matrix} [112131]AT 123111∣∣∣122 [A∣b]=[14663∣∣115][ATA∣ATb]可得: { 14 b ^ + 6 a ^ = 11 6 b ^ + 3 a ^ = 5 \left\{\begin{matrix} 14 \widehat{b} + 6 \widehat{a} = 11 \\ 6 \widehat{b} + 3 \widehat{a} = 5 \end{matrix}\right. {14b +6a =116b +3a =5,解得: { b ^ = 1 2 a ^ = 2 3 \left\{\begin{matrix} \widehat{b} = \dfrac{1}{2} \\ \widehat{a} = \dfrac{2}{3} \end{matrix}\right. ⎩ ⎨ ⎧b =21a =32
所以 y ^ = 1 2 x + 2 3 \widehat{y} = \dfrac{1}{2} x + \dfrac{2}{3} y =21x+32
打赏
制作不易,若有帮助,欢迎打赏!