(Ordinary) Least Squares Linear Regression
一、条件
样本集呈线性分布
二、原理
用一个超平面/直线去拟合样本集,使样本点的标签值与预测值的差的平方和最小
注:不是样本点到直线的距离最小
h ( x i 1 , x i 2 , ⋯ , x i d ) = ∑ j = 1 d w j x i j − θ h(x_{i1},x_{i2},\cdots,x_{id}) = \sum\limits_{j=1}^d w_jx_{ij} -\theta h(xi1,xi2,⋯,xid)=j=1∑dwjxij−θ , i = 1 , 2 , ⋯ , n i=1,2,\cdots,n i=1,2,⋯,n
令 x i 0 = 1 x_{i0}=1 xi0=1 , w 0 = θ w_0=\theta w0=θ , i = 1 , 2 , ⋯ , n i = 1,2,\cdots,n i=1,2,⋯,n 即 x ⃗ i = [ 1 x i 1 x i 2 ⋯ x i d ] T \vec x_i = \begin{bmatrix} 1&x_{i1}&x_{i2}&\cdots&x_{id} \end{bmatrix}^T xi=[1xi1xi2⋯xid]T , w ⃗ = [ θ w 1 w 2 ⋯ w d ] T \vec w = \begin{bmatrix} \theta&w_1&w_2&\cdots&w_d \end{bmatrix}^T w=[θw1w2⋯wd]T 则 h ( x ⃗ i ) = w ⃗ T ⋅ x ⃗ i h(\vec x_i)=\vec w^T\cdot\vec x_i h(xi)=wT⋅xi
-
构造损失函数
L ( h ) = 1 n ∑ i = 1 n ( h ( x ⃗ i ) − y i ) 2 L(h) = \frac{1}{n}\sum\limits_{i=1}^n(h(\vec x_i)-y_i)^2 L(h)=n1i=1∑n(h(xi)−yi)2 ,即均方误差
-
求损失函数取最小值时对应的假设 h h h
假设 h h h 与 w ⃗ \vec w w 有关,将 L ( h ) L(h) L(h) 化为自变量为 w ⃗ \vec w w 的函数
得 L ( w ⃗ ) = 1 n ∑ i = 1 n ( w ⃗ T ⋅ x ⃗ i − y i ) 2 L(\vec w) = \frac{1}{n}\sum\limits_{i=1}^n(\vec w^T\cdot \vec x_i-y_i)^2 L(w)=n1i=1∑n(wT⋅xi−yi)2
令 X = [ x ⃗ 1 T x ⃗ 2 T ⋯ x ⃗ n T ] T \mathbf X=\begin{bmatrix} \vec x_1^T&\vec x_2^T&\cdots&\vec x_n^T \end{bmatrix}^T X=[x1Tx2T⋯xnT]T , y ⃗ = [ y 1 y 2 ⋯ y n ] T \vec y = \begin{bmatrix} y_1&y_2&\cdots&y_n \end{bmatrix}^T y=[y1y2⋯yn]T
得 L ( w ⃗ ) = 1 n ( X ⋅ w ⃗ − y ⃗ ) T ⋅ ( X ⋅ w ⃗ − y ⃗ ) L(\vec w) = \frac{1}{n}(\mathbf X\cdot\vec w- \vec y)^T\cdot(\mathbf X\cdot\vec w- \vec y) L(w)=n1(X⋅w−y)T⋅(X⋅w−y)
= 1 n ( w ⃗ T X T X w ⃗ − w ⃗ T X T y ⃗ − y ⃗ T X w ⃗ + y ⃗ T y ⃗ ) =\frac{1}{n}(\vec w^T\mathbf X^T\mathbf X\vec w-\vec w^T\mathbf X^T\vec y-\vec y^T\mathbf X\vec w+\vec y^T\vec y) =n1(wTXTXw−wTXTy−yTXw+yTy)
= 1 n ( w ⃗ T X T X w ⃗ − 2 w ⃗ T X T y ⃗ + y ⃗ T y ⃗ ) =\frac{1}{n}(\vec w^T\mathbf X^T\mathbf X\vec w-2\vec w^T\mathbf X^T\vec y+\vec y^T\vec y) =n1(wTXTXw−2wTXTy+yTy) ,因为 w ⃗ T X T y ⃗ \vec w^T\mathbf X^T\vec y wTXTy 与 y ⃗ T X w ⃗ \vec y^T\mathbf X\vec w yTXw 均为 1 × 1 1\times1 1×1 矩阵
-
梯度下降法
-
解析法
求 w ⃗ ∗ \vec w^* w∗ 使 ∂ ∂ w ⃗ L ( w ⃗ ∗ ) = 0 \frac{\partial}{\partial \vec w}L(\vec w^*) = 0 ∂w∂L(w∗)=0 ,则 w ⃗ ∗ \vec w^* w∗ 即为 L ( w ⃗ ) L(\vec w) L(w) 对最优解(凸优化问题)
∂ ∂ w ⃗ L ( w ⃗ ) = 2 X T X w ⃗ − 2 y ⃗ T X \frac{\partial}{\partial \vec w}L(\vec w) = 2\mathbf X^T\mathbf X\vec w-2\vec y^T\mathbf X ∂w∂L(w)=2XTXw−2yTX
2 X T X w ⃗ ∗ − 2 y ⃗ T X = 0 2\mathbf X^T\mathbf X\vec w^*-2\vec y^T\mathbf X = 0 2XTXw∗−2yTX=0
w ⃗ ∗ = ( X T X ) − 1 y ⃗ T X = ( X T X ) − 1 X T y ⃗ \vec w^*=(\mathbf X^T\mathbf X)^{-1}\vec y^T \mathbf X = (\mathbf X^T\mathbf X)^{-1}\mathbf X^T\vec y w∗=(XTX)−1yTX=(XTX)−1XTy
-