一、最小二乘法
最小二乘法是一种数学优化方法,通过最小化误差的平方和来拟合数据点。
以线性回归模型为例,如果我们用最小二乘法来求解线性回归的系数,可得:
e r r ( y i − y ^ ) = 1 n ∑ i = 1 n ( y i − y ^ ) 2 = 1 n ∑ i = 1 n ( y i − w T x i ) 2 = 1 n ( y − w X ) T ( y − w X ) = 1 n ( y T y − 2 w X T y + w T X T X w ) \begin{aligned} err(y_i-\hat y) &= \frac{1}{n}\sum_{i=1}^n (y_i-\hat y)^2 = \frac{1}{n}\sum_{i=1}^n (y_i-w^Tx_i)^2\\ &= \frac{1}{n}(y-wX)^T(y-wX) \\ &= \frac{1}{n}(y^Ty-2wX^Ty+w^TX^TXw) \end{aligned} err(yi−y^)=n1i=1∑n(yi−y^)2=n1i=1∑n(yi−wTxi)2=n1(y−wX)T(y−wX)=n1(yTy−2wXTy+wTXTXw)
我们要求上式的最小值,要对其求导,然后寻找极小值点。
∂ ∂ w e r r = 1 n ( 2 w X T X − 2 X T y ) = 0 w X T X = X T y w = ( X T X ) − 1 X T y \begin{aligned} \frac{\partial}{\partial w}err &= \frac{1}{n}(2wX^TX-2X^Ty) = 0\\ &wX^TX = X^Ty\\ &w=(X^TX)^{-1}X^Ty \end{aligned} ∂w∂err=n1(2wXTX−2XTy)=0wXTX=XTyw=(XTX)−1XTy
由此我们便可以推导出参数的表达式。
二、QR分解
QR分解是把一个矩阵分解为一个正交矩阵和一个上三角矩阵的积。即有实数矩阵A,有 A = Q × R A=Q\times R A=Q×R,其中Q为正交矩阵( Q T ⋅ Q = I Q^T\cdot Q=I QT⋅Q=I),R为上三角矩阵。QR分解常见的算法有Gram–Schmid正交化、Household变换,以及Givens变换。
2.1 Gran-Schmid正交化
设矩阵 A = ( a 1 ⃗ , a 2 ⃗ , . . . , a n ⃗ ) A=(\vec{a_1},\vec{a_2},...,\vec{a_n}) A=(a1,a2,...,an),对矩阵A进行Gran-Schmid正交化过程。其中 p i ⃗ \vec{p_i} pi为正交向量, q i ⃗ \vec{q_i} qi为归一化后的标准正交向量, i = 1 , 2 , . . . , n i=1,2,...,n i=1,2,...,n 。
p 1 ⃗ = a 1 ⃗ = ∥ p 1 ⃗ ∥ q 1 ⃗ = r 11 q 1 ⃗ p 2 ⃗ = a 2 ⃗ − a 2 ⃗ ⋅ p 1 ⃗ ∥ p 1 ⃗ ∥ 2 ⋅ p 1 ⃗ = ∥ p 2 ⃗ ∥ q 2 ⃗ a 2 ⃗ = ∥ p 2 ⃗ ∥ q 2 ⃗ + a 2 ⃗ ⋅ p 1 ⃗ ∥ p 1 ⃗ ∥ 2 ⋅ p 1 ⃗ = ∥ p 2 ⃗ ∥ q 2 ⃗ + a 2 ⃗ ⋅ p 1 ⃗ ∥ p 1 ⃗ ∥ 2 ∥ p 1 ⃗ ∥ q 1 ⃗ = r 21 q 1 ⃗ + r 22 q 2 ⃗ p 3 ⃗ = a 3 ⃗ − a 3 ⃗ ⋅ p 1 ⃗ ∥ p 1 ⃗ ∥ 2 ⋅ p 1 ⃗ − a 3 ⃗ ⋅ p 2 ⃗ ∥ p 2 ⃗ ∥ 2 ⋅ p 2 ⃗ = ∥ p 3 ⃗ ∥ q 3 ⃗ a 3 ⃗ = ∥ p 3 ⃗ ∥ q 3 ⃗ + a 3 ⃗ ⋅ p 1 ⃗ ∥ p 1 ⃗ ∥ 2 ⋅ p 1 ⃗ + a 3 ⃗ ⋅ p 2 ⃗ ∥ p 2 ⃗ ∥ 2 ⋅ p 2 ⃗ = r 31 q 1 ⃗ + r 32 q 2 ⃗ + r 33 q 3 ⃗ \begin{aligned} \vec{p_1} &= \vec{a_1} = \lVert \vec{p_1}\rVert\vec{q_1}=r_{11}\vec{q_1}\\ \vec{p_2} &= \vec{a_2} - \frac{\vec{a_2}\cdot \vec{p_1}}{\lVert \vec{p_1}\rVert^2}\cdot \vec{p_1} = \lVert \vec{p_2}\rVert\vec{q_2}\\ \vec{a_2} &= \lVert \vec{p_2}\rVert\vec{q_2} + \frac{\vec{a_2}\cdot \vec{p_1}}{\lVert \vec{p_1}\rVert^2}\cdot \vec{p_1}\\ &= \lVert \vec{p_2}\rVert\vec{q_2} + \frac{\vec{a_2}\cdot \vec{p_1}}{\lVert \vec{p_1}\rVert^2}\lVert \vec{p_1}\rVert \vec{q_1}\\ &= r_{21}\vec{q_1} + r_{22}\vec{q_2}\\ \vec{p_3} &= \vec{a_3} - \frac{\vec{a_3}\cdot \vec{p_1}}{\lVert \vec{p_1}\rVert^2}\cdot \vec{p_1} - \frac{\vec{a_3}\cdot \vec{p_2}}{\lVert \vec{p_2}\rVert^2}\cdot \vec{p_2}\\ &= \lVert \vec{p_3}\rVert\vec{q_3}\\ \vec{a_3} &= \lVert \vec{p_3}\rVert\vec{q_3} + \frac{\vec{a_3}\cdot \vec{p_1}}{\lVert \vec{p_1}\rVert^2}\cdot \vec{p_1} + \frac{\vec{a_3}\cdot \vec{p_2}}{\lVert \vec{p_2}\rVert^2}\cdot \vec{p_2}\\ &= r_{31}\vec{q_1} + r_{32}\vec{q_2} + r_{33}\vec{q_3} \end{aligned} p1p2a2p3a3=a1=∥p1∥q1=r11q1=a2−∥p1∥2a2⋅p1⋅p1=∥p2∥q2=∥p2