线性模型
一元线性回归
- 基本形式
f ( x ) = w 1 x 1 + w 2 x 2 + … + w d x d + b f(\boldsymbol{x})=w_{1} x_{1}+w_{2} x_{2}+\ldots+w_{d} x_{d}+b f(x)=w1x1+w2x2+…+wdxd+b
向量形式
f ( x ) = w T x + b f(\boldsymbol{x})=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}+b f(x)=wTx+b
目标:均方误差最小化
( w ∗ , b ∗ ) = arg min ( w , b ) ∑ i = 1 m ( f ( x i ) − y i ) 2 = arg min ( w , b ) ∑ i = 1 m ( y i − w x i − b ) 2 \begin{aligned}\left(w^{*}, b^{*}\right) &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \\ &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned} (w∗,b∗)=(w,b)argmini=1∑m(f(xi)−yi)2=(w,b)argmini=1∑m(yi−wxi−b)2
方法:线性回归模型的最小二乘“参数估计”。将 E w , b E_{w,b} Ew,b分别对 w w w和 b b b求导得到:
∂ E ( w , b ) ∂ w = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) \frac{\partial E_{(w, b)}}{\partial w}=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) ∂w∂E(w,b)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)
= ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 =\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} =i=1∑m∂w∂(yi−wxi−b)2
= ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − x i ) =\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) =i=1∑m2⋅(yi−wxi−b)⋅(−xi)
∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) \frac{\partial E_{(w, b)}}{\partial b}=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) ∂b∂E(w,b)=2(mb−i=1∑m(yi−wxi))
这里 E w , b E_{w,b} Ew,b是关于w和b的凸函数,当它关于w和b的导数均为零时,得到w和b的最优解.
判断凹凸性:
设f(x,y)在区域D上具有二阶连续偏导数,记$A = f_{xx}’’(x,y),B = f_{xy}’’(x,y),C = f_{yy}’’(x,y) $则:
(1)D上恒有A>0,且AC- B 2 > = 0 B^2>=0 B2>=0时,f(x,y)在区域D上是凸函数;
(2)D上恒有 A < 0 A<0 A<0且 A C − B 2 ≥ 0 A C-B^{2} \geq 0 AC−B2≥0时,f(x,y)在区域D上是凹函数
∂ 2 E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ w [ 2 w ∑ i = 1 m x i 2 ] \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w^{2}} &=\frac{\partial}{\partial w}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial w}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial w}\left[2 w \sum_{i=1}^{m} x_{i}^{2}\right] \end{aligned} ∂w2∂2E(w,b)=∂w∂(∂w∂E(w,b))=∂w∂[2(wi=1∑mxi2−i=1∑m(yi−b)xi)]=∂w∂[2wi=1∑mxi2]
= 2 ∑ i = 1 m x i 2 =2 \sum_{i=1}^{m} x_{i}^{2} =2i=1∑mxi2
∂ 2 E ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ b [ − 2 ∑ i = 1 m y i x i + 2 ∑ i = 1 m b x i ) = ∂ ∂ b ( 2