此文章主要是结合哔站shuhuai008大佬的白板推导视频:线性回归_70min
全部笔记的汇总贴:机器学习-白板推导系列笔记
一、概述
假设有以下数据:
D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } x i ∈ R p , y i ∈ R , i = 1 , 2 , ⋯ , N X = ( x 1 , x 1 , ⋯ , x N ) T = ( x 1 T x 2 T ⋮ x N T ) = ( x 11 x 12 ⋯ x 1 p x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋱ ⋮ x N 1 x N 2 ⋯ x N p ) N × p Y = ( y 1 y 2 ⋮ y N ) N × 1 D=\left \{(x_{1},y_{1}),(x_{2},y_{2}),\cdots ,(x_{N},y_{N})\right \}\\ x_{i}\in \mathbb{R}^{p},y_{i}\in \mathbb{R},i=1,2,\cdots ,N\\ X=(x_{1},x_{1},\cdots ,x_{N})^{T}=\begin{pmatrix} x_{1}^{T}\\ x_{2}^{T}\\ \vdots \\ x_{N}^{T} \end{pmatrix}=\begin{pmatrix} x_{11} & x_{12} & \cdots &x_{1p} \\ x_{21} & x_{22}& \cdots &x_{2p} \\ \vdots & \vdots & \ddots &\vdots \\ x_{N1}& x_{N2} & \cdots & x_{Np} \end{pmatrix}_{N \times p}\\ Y=\begin{pmatrix} y_{1}\\ y_{2}\\ \vdots \\ y_{N} \end{pmatrix}_{N \times 1} D={ (x1,y1),(x2,y2),⋯,(xN,yN)}xi∈Rp,yi∈R,i=1,2,⋯,NX=(x1,x1,⋯,xN)T=⎝⎜⎜⎜⎛x1Tx2T⋮xNT⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x11x21⋮xN1x12x22⋮xN2⋯⋯⋱⋯x1px2p⋮xNp⎠⎟⎟⎟⎞N×pY=⎝⎜⎜⎜⎛y1y2⋮yN⎠⎟⎟⎟⎞N×1
这些数据符合下图关系(以一维数据为例),这里的函数 f ( w ) f(w) f(w)忽略了偏置 b b b:
二、最小二乘估计
L ( w ) = ∑ i = 1 N ∥ w T x i − y i ∥ 2 2 = ∑ i = 1 N ( w T x i − y i ) 2 = ( w T x 1 − y 1 w T x 2 − y 2 ⋯ w T x N − y N ) ⏟ ( w T x 1 w T x 2 ⋯ w T x N ) − ( y 1 y 2 ⋯ y N ) ⏟ w T ( x 1 x 2 ⋯ x N ) − ( y 1 y 2 ⋯ y N ) ⏟ w T X T − Y T ( w T x 1 − y 1 w T x 2 − y 2 ⋮ w T x N − y N ) = ( w T X T − Y T ) ( X w − Y ) = w T X T X w − w T X T Y − Y T X w + Y T Y = w T X T X w − 2 w T X T Y + Y T Y L(w)=\sum_{i=1}^{N}\left \| w^{T}x_{i}-y_{i}\right \|_{2}^{2}\\ =\sum_{i=1}^{N}(w^{T}x_{i}-y_{i})^{2}\\ =\underset{\underset{\underset{w^{T}X^{T}-Y^{T}}{\underbrace{w^{T}\begin{pmatrix} x_{1} & x_{2} & \cdots & x_{N} \end{pmatrix}-\begin{pmatrix} y_{1} & y_{2} & \cdots & y_{N} \end{pmatrix}}}}{\underbrace{\begin{pmatrix} w^{T}x_{1} & w^{T}x_{2} & \cdots & w^{T}x_{N} \end{pmatrix}-\begin{pmatrix} y_{1} & y_{2} & \cdots & y_{N} \end{pmatrix}}}}{\underbrace{\begin{pmatrix} w^{T}x_{1}-y_{1} & w^{T}x_{2}-y_{2} & \cdots & w^{T}x_{N}-y_{N} \end{pmatrix}}}\begin{pmatrix} w^{T}x_{1}-y_{1}\\ w^{T}x_{2}-y_{2}\\ \vdots \\ w^{T}x_{N}-y_{N} \end{pmatrix}\\ =(w^{T}X^{T}-Y^{T})(Xw-Y)\\ =w^{T}X^{T}Xw-w^{T}X^{T}Y-Y^{T}Xw+Y^{T}Y\\ =w^{T}X^{T}Xw-2w^{T}X^{T}Y+Y^{T}Y L(w)=i=1∑N∥∥wTxi−yi∥∥22=i=1∑N(wTxi−yi)2=wTXT−YT wT(x1x2⋯xN)−(y1y2⋯yN)