【Datawhale-机器学习-Task02-线性回归】

   前言
   Datawhale开源学习:机器学习,202406
   西瓜书+南瓜书 第三章 线性回归

先上个图简单总结下基本流程。
线性回归

极大似然估计
概率:是已知模型的概率,去推测执行后的结果。
似然:就是通过事实(数据),来推断出函数参数最有可能的值。
举例,根据服从正态分布的 X ∼ N ( μ , σ 2 ) X\sim N\left ( \mu ,\sigma ^{2} \right ) XN(μ,σ2)的一批观测样本,随机变量X的概率密度函数为:
p ( x ; μ , σ 2 ) = 1 2 π σ exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) p\left(x ; \mu, \sigma^{2}\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right) p(x;μ,σ2)=2π σ1exp(2σ2(xμ)2)
得到似然函数: L ( μ , σ 2 ) = ∏ i = 1 n p ( x i ; μ , σ 2 ) = ∏ i = 1 n 1 2 π σ exp ⁡ ( − ( x i − μ ) 2 2 σ 2 ) L\left(\mu, \sigma^{2}\right)=\prod_{i=1}^{n} p\left(x_{i} ; \mu, \sigma^{2}\right)=\prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(x_{i}-\mu\right)^{2}}{2 \sigma^{2}}\right) L(μ,σ2)=i=1np(xi;μ,σ2)=i=1n2π σ1exp(2σ2(xiμ)2)
极大似然:求解 μ \mu μ σ 2 \sigma ^{2} σ2,使得 L ( μ , σ 2 ) L\left(\mu, \sigma^{2}\right) L(μ,σ2)最大。

定义1:
   凸函数,设 D ⊂ R n D\subset R^{n} DRn 是非空凸集,f是定义在D上的函数,如果对任意的, x 1 x^{1} x1 x 2 x^{2} x2∈D以及α∈(0,1),均有
f ( α x 1 + ( 1 − α ) x 2 ) ≤ α f ( x 1 ) + ( 1 − α ) f ( x 2 ) f(\alpha x^{1} +\left ( 1-\alpha \right )x^{2} )\le \alpha f\left ( x^{1} \right ) + \left (1-\alpha \right ) f\left ( x^{2} \right ) fαx1+(1α)x2αf(x1)+(1α)f(x2)
则称f为D上的凸函数。

定理1:如果f(x)的Hessian矩阵 ▽ 2 f ( x ) \bigtriangledown ^{2} f\left ( x \right ) 2f(x)在D上是半正定的,则f(x)是D上的凸函数;如果∇^2 f(x)在D上是正定的,则f(x)是D上的严格凸函数。

定理2:若f(x)是凸函数,且f(x)一阶连续可微,则 x ∗ x^{*} x是全局解的充分必要条件是其梯度等于零向量,即 ▽ f ( x ∗ ) = 0 \bigtriangledown f\left ( x^{*} \right ) =0 f(x)=0

定义2:梯度,多元一次函数在各分量x_i处偏导数均存在,则称函数f(x)在x处一阶可导,其梯度函数(一阶函数)为
▽ f ( x ) = ∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] \bigtriangledown f\left ( x \right ) = \frac{\partial f\left ( x \right )}{\partial x} =\begin{bmatrix}\frac{\partial f\left ( x \right )}{\partial x_{1} } \\\frac{\partial f\left ( x \right )}{\partial x_{2} } \\\vdots \\\frac{\partial f\left ( x \right )}{\partial x_{n} } \end{bmatrix} f(x)=xf(x)= x1f(x)x2f(x)xnf(x)

另外,Hessian矩阵就是f(x)二阶求导;
顺序主子式:
H i = ∣ a 11 a 12 … a 1 n a 21 a 22 … a 2 n … … … … a n 1 a n 2 … a 21 ∣ H_{i} =\begin{vmatrix} a_{11} & a_{12} & \dots & a_{1n}\\ a_{21}& a_{22} & \dots & a_{2n}\\ \dots& \dots & \dots &\dots \\ a_{n1}& a_{n2}& \dots &a_{21} \end{vmatrix} Hi= a11a21an1a12a22an2a1na2na21
   其中,i=1,2…n,称为矩阵 A = ( a i j ) n × n A=\left ( a_{ij} \right ) _{n\times n} A=(aij)n×n 的顺序主子式。
   顺序主子式非负,该矩阵为半正定矩阵;顺序主子式大于零,该矩阵为正定矩阵。
   线性回归的关键是求解到下面公式中w和b的最优解。需要证明其是凸函数。
( w ∗ , b ∗ ) = arg ⁡ min ⁡ ( w , b ) ∑ i = 1 m ( y i − w x i − b ) 2 \left(w^{*}, b^{*}\right)=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} (w,b)=(w,b)argmini=1m(yiwxib)2

E ( w , b ) = ∑ i = 1 m ( y i − w x i − b ) 2 E(w, b)=\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} E(w,b)=i=1m(yiwxib)2
则有
∂ E ( w , b ) ∂ w = 2 ⋅ ∑ i = 1 m ( y i − w x i − b ) ( − x i ) \frac{\partial E\left ( w,b \right ) }{\partial w } =2\cdot {\textstyle \sum_{i=1}^{m}} \left ( y_{i}-wx_{i} -b \right )\left ( -x_{i} \right ) wE(w,b)=2i=1m(yiwxib)(xi)
∂ E ( w , b ) ∂ w = 2 ⋅ ∑ i = 1 m ( w x i + b − y i ) ( x i ) \frac{\partial E\left ( w,b \right ) }{\partial w } =2\cdot {\textstyle \sum_{i=1}^{m}} \left ( wx_{i} +b-y_{i} \right )\left ( x_{i} \right ) wE(w,b)=2i=1m(wxi+byi)(xi)
∂ E ( w , b ) ∂ w = 2 w ⋅ ∑ i = 1 m ( x i 2 ) + 2 ∑ i = 1 m ( b − y i ) x i \frac{\partial E\left ( w,b \right ) }{\partial w } =2w\cdot {\textstyle \sum_{i=1}^{m}} \left ( x_{i}^{2} \right )+2 {\textstyle \sum_{i=1}^{m}} \left ( b-y_{i} \right )x_{i} wE(w,b)=2wi=1m(xi2)+2i=1m(byi)xi
另外,
∂ E ( w , b ) ∂ b = 2 ∑ i = 1 m ( y i − w x i − b ) ( − 1 ) \frac{\partial E\left ( w,b \right ) }{\partial b } =2 {\textstyle \sum_{i=1}^{m}} \left ( y_{i}-w x_{i}-b \right )\left ( -1 \right ) bE(w,b)=2i=1m(yiwxib)(1)
∂ E ( w , b ) ∂ b = 2 ∑ i = 1 m ( w x i + b − y i ) \frac{\partial E\left ( w,b \right ) }{\partial b } =2 {\textstyle \sum_{i=1}^{m}} \left ( w x_{i}+b-y_{i} \right ) bE(w,b)=2i=1m(wxi+byi)
∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) \frac{\partial E\left ( w,b \right ) }{\partial b } =2 \left (mb- {\textstyle \sum_{i=1}^{m}}\left (y_{i} -wx_{i} \right ) \right ) bE(w,b)=2(mbi=1m(yiwxi))
   定理1证明过程略,上班的人实在是没时间细写;
   根据定理2,有
{ ∂ E ( w , b ) ∂ w = 0 ∂ E ( w , b ) ∂ b = 0 \begin{cases}\frac{\partial E\left ( w,b \right ) }{\partial w } =0 \\ \frac{\partial E\left ( w,b \right ) }{\partial b } =0 \end{cases} {wE(w,b)=0bE(w,b)=0
则有
{ 2 w ∑ i = 1 m ( x i 2 ) + 2 ∑ i = 1 m ( b − y i ) x i = 0 b = 1 m ∑ i = 1 m ( y i − w x i ) \begin{cases}2w {\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right )+ 2 {\textstyle \sum_{i=1}^{m}}\left ( b-y_{i} \right ) x_{i} =0 \\ b=\frac{1}{m} {\textstyle \sum_{i=1}^{m}} \left ( y_{i} -wx_{i} \right ) \end{cases} {2wi=1m(xi2)+2i=1m(byi)xi=0b=m1i=1m(yiwxi)
将b进行简化,得到
b = 1 m ∑ i = 1 m ( y i ) − 1 m ∑ i = 1 m ( w x i ) b=\frac{1}{m} {\textstyle \sum_{i=1}^{m}} \left ( y_{i} \right )-\frac{1}{m} {\textstyle \sum_{i=1}^{m}}\left ( wx_{i} \right ) b=m1i=1m(yi)m1i=1m(wxi)
b = y ˉ − w x ˉ b=\bar{y} -w\bar{x} b=yˉwxˉ
带入 2 w ∑ i = 1 m ( x i 2 ) + 2 ∑ i = 1 m ( b − y i ) x i = 0 2w {\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right )+ 2 {\textstyle \sum_{i=1}^{m}}\left ( b-y_{i} \right ) x_{i} =0 2wi=1m(xi2)+2i=1m(byi)xi=0中,有
w ∑ i = 1 m ( x i 2 ) = ∑ i = 1 m ( y i − b ) ( x i ) w {\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right ) = {\textstyle \sum_{i=1}^{m}}\left ( y_{i} -b \right )\left ( x_{i} \right ) wi=1m(xi2)=i=1m(yib)(xi)
w ∑ i = 1 m ( x i 2 ) = ∑ i = 1 m ( x i y i ) − ∑ i = 1 m ( x i b ) w {\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right ) = {\textstyle \sum_{i=1}^{m}}\left ( x_{i}y_{i} \right ) -{\textstyle \sum_{i=1}^{m}}\left ( x_{i}b \right ) wi=1m(xi2)=i=1m(xiyi)i=1m(xib)
w ∑ i = 1 m ( x i 2 ) = ∑ i = 1 m ( x i y i ) − ∑ i = 1 m x i ( y ˉ − w x ˉ ) w {\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right ) = {\textstyle \sum_{i=1}^{m}}\left ( x_{i}y_{i} \right ) - {\textstyle \sum_{i=1}^{m}}x_{i}\left ( \bar{y}-w\bar{x} \right ) wi=1m(xi2)=i=1m(xiyi)i=1mxi(yˉwxˉ)
w ∑ i = 1 m ( x i 2 ) = ∑ i = 1 m ( x i y i ) − ∑ i = 1 m x i y ˉ + w ∑ i = 1 m x i x ˉ w {\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right ) = {\textstyle \sum_{i=1}^{m}}\left ( x_{i}y_{i} \right ) - {\textstyle \sum_{i=1}^{m}}x_{i} \bar{y} +w {\textstyle \sum_{i=1}^{m}}x_{i} \bar{x} wi=1m(xi2)=i=1m(xiyi)i=1mxiyˉ+wi=1mxixˉ
w ∑ i = 1 m ( x i 2 ) − w ∑ i = 1 m x i x ˉ = ∑ i = 1 m ( x i y i ) − ∑ i = 1 m x i y ˉ w {\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right )-w {\textstyle \sum_{i=1}^{m}}x_{i} \bar{x} = {\textstyle \sum_{i=1}^{m}}\left ( x_{i}y_{i} \right ) - {\textstyle \sum_{i=1}^{m}}x_{i} \bar{y} wi=1m(xi2)wi=1mxixˉ=i=1m(xiyi)i=1mxiyˉ
w = ∑ i = 1 m ( x i y i ) − ∑ i = 1 m x i y ˉ ∑ i = 1 m ( x i 2 ) − ∑ i = 1 m x i x ˉ w = \frac{{\textstyle \sum_{i=1}^{m}}\left ( x_{i}y_{i} \right ) - {\textstyle \sum_{i=1}^{m}}x_{i} \bar{y} }{{\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right )-{\textstyle \sum_{i=1}^{m}}x_{i} \bar{x}} w=i=1m(xi2)i=1mxixˉi=1m(xiyi)i=1mxiyˉ
w = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m ( x i 2 ) − 1 m ( ∑ i = 1 m x i ) 2 w = \frac{{\textstyle \sum_{i=1}^{m}} y_{i} \left ( x_{i}-\bar{x} \right ) }{{\textstyle \sum_{i=1}^{m}}\left ( x_{i}^{2} \right )-\frac{1}{m} \left ( {\textstyle \sum_{i=1}^{m}}x_{i} \right )^{2} } w=i=1m(xi2)m1(i=1mxi)2i=1myi(xixˉ)

多元线性回归:
w ^ ∗ = arg ⁡ min ⁡ w ^ ( y − X w ^ ) T ( y − X w ^ ) \widehat{w} ^{*} =\underset{\widehat w }{\arg \min } \left ( y-X\widehat{w} \right ) ^{T} \left (y-X\widehat{w} \right ) w =w argmin(yXw )T(yXw )
直接点就是多元函数求最优值问题,跟之前类似,即凸函数求解最优值的问题。需要分两步:第一证明其是凸函数(过程同样略过),第二步求解。令 E w ^ = arg ⁡ min ⁡ w ^ ( y − X w ^ ) T ( y − X w ^ ) E_{{\widehat w }} =\underset{\widehat w }{\arg \min } \left ( y-X\widehat{w} \right ) ^{T} \left (y-X\widehat{w} \right ) Ew =w argmin(yXw )T(yXw ) ,对 w ^ {\widehat w } w 求导,则有
∂ E w ^ ∂ w ^ = ∂ ( y T y − X T w ^ T y − y T X w ^ + X T w ^ T X w ^ ) ∂ w ^ \frac{\partial E_{\widehat{w}} }{\partial \widehat{w}} =\frac{\partial\left ( y^{T}y-X^{T} \widehat{w}^{T}y-y^{T}X\widehat{w}+X^{T}\widehat{w}^{T}X\widehat{w} \right ) }{\partial \widehat{w}} w Ew =w (yTyXTw TyyTXw +XTw TXw )
∂ E w ^ ∂ w ^ = ∂ ( − X T w ^ T y − y T X w ^ + X T w ^ T X w ^ ) ∂ w ^ \frac{\partial E_{\widehat{w}} }{\partial \widehat{w}} =\frac{\partial\left (-X^{T} \widehat{w}^{T}y-y^{T}X\widehat{w}+X^{T}\widehat{w}^{T}X\widehat{w} \right ) }{\partial \widehat{w}} w Ew =w (XTw TyyTXw +XTw TXw )
∂ E w ^ ∂ w ^ = − 2 y X T + ∂ ( X T w ^ T X w ^ ) ∂ w ^ \frac{\partial E_{\widehat{w}} }{\partial \widehat{w}} =-2yX^{T} +\frac{\partial \left ( X^{T}\widehat{w}^{T}X\widehat{w} \right ) }{\partial \widehat{w}} w Ew =2yXT+w (XTw TXw )
∂ E w ^ ∂ w ^ = − 2 y X T + 2 X T X w ^ \frac{\partial E_{\widehat{w}} }{\partial \widehat{w}} =-2yX^{T} +2X^{T} X\widehat{w} w Ew =2yXT+2XTXw
∂ E w ^ ∂ w ^ = 2 X T ( X w ^ − y ) \frac{\partial E_{\widehat{w}} }{\partial \widehat{w}} =2X^{T}\left (X\widehat{w}-y \right ) w Ew =2XT(Xw y)
   其中用到公式, ∂ a T x ∂ x = ∂ x T a ∂ x = a \frac{\partial a^{T}x }{\partial x } =\frac{\partial x^{T}a }{\partial x } =a xaTx=xxTa=a以及 ( ∂ x T A x ) / ∂ x = ( A + A T ) x (∂x^T Ax)/∂x=(A+A^T )x (xTAx)/x=(A+AT)x

  最终有 w ^ ∗ = ( X T X ) − 1 X T y \widehat{w} ^{*} =\left ( X^{T} X \right )^{-1} X^{T}y w =(XTX)1XTy.

线性回归模型:
   线性回归模型: y = w T x + b y=w^T x+b y=wTx+b
  对数线性模型: l n y = w T x + b ln y=w^T x+b lny=wTx+b
  广义线性模型: y = g − 1 ( w T x + b ) ) y=g^{-1} \left ( w^{T}x+b \right ) ) y=g1(wTx+b))
  其中单调可微函数g(∙):连续且充分光滑。
  理解:对数及广义线性模型是为了简化模型里面数据与标记的复杂非线性关系,更加简化理解和运算,本质是函数映射。

视频总结:机器学习三要素,模型、策略、算法。

模型:选择 y = w x + b y=wx+b y=wx+b 还是 y = A x 2 y=Ax^2 y=Ax2
策略:根据评价标准,选取最优模型策略,产生损失函数;
算法:算出w、b分别取值多少合适。

感谢Datawhale小组所做的贡献,本次学习主要参考视频:
https://www.bilibili.com/video/BV1Mh411e7VU?p=3&vd_source=7f1a93b833d8a7093eb3533580254fe4
https://www.bilibili.com/video/BV1Mh411e7VU?p=4&vd_source=7f1a93b833d8a7093eb3533580254fe4

  • 22
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值