机器学习(一)——线性回归

1. 线性回归

成本函数:最小二乘
J ( θ ) = 1 2 ∑ i = 1 m ( θ T x ( i ) − y ( i ) ) ) 2 J(θ)=\frac{1}{2}\sum^m_{i=1}(θ^Tx^{(i)}-y^{(i)}))^2 J(θ)=21i=1m(θTx(i)y(i)))2
利用梯度下降法:
θ j = θ j − α ∂ J ( θ ) ∂ θ j θ_j=θ_j-α\frac{\partial J(θ)}{\partial θ_j} θj=θjαθjJ(θ)
则我们需要求导:

  • 直接对元素求导

∂ J ( θ ) ∂ θ j = ∑ i = 1 m ∂ ∂ θ j 1 2 ( θ T x ( i ) − y ( i ) ) ) 2 = ∑ i = 1 m { ( θ T x ( i ) − y ( i ) ) ∂ ∂ θ j ( θ T x ( i ) − y ( i ) ) } = ∑ i = 1 m { ( θ T x ( i ) − y ( i ) ) x j ( i ) } \begin{aligned} \frac{\partial J(θ)}{\partial θ_j} &=\sum^m_{i=1}\frac{\partial }{\partial θ_j}\frac{1}{2}(θ^Tx^{(i)}-y^{(i)}))^2\\ &=\sum^m_{i=1}\{(θ^Tx^{(i)}-y^{(i)})\frac{\partial }{\partial θ_j}(θ^Tx^{(i)}-y^{(i)})\}\\ &=\sum^m_{i=1}\{(θ^Tx^{(i)}-y^{(i)})x^{(i)}_j\} \end{aligned} θjJ(θ)=i=1mθj21(θTx(i)y(i)))2=i=1m{(θTx(i)y(i))θj(θTx(i)y(i))}=i=1m{(θTx(i)y(i))xj(i)}

  • 转化为矩阵求导:

    令:
    X = [ — ( x ( 1 ) ) T — — ( x ( 2 ) ) T — ⋮ — ( x ( m ) ) T — ] , θ = [ θ 0 θ 1 ⋮ θ n ] , y = [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] X=\left[ \begin{matrix} —(x^{(1)})^T—\\ —(x^{(2)})^T—\\ \vdots\\ —(x^{(m)})^T— \end{matrix} \right] ,θ=\left[ \begin{matrix} θ_0\\ θ_1\\ \vdots\\ θ_n \end{matrix} \right], y=\left[ \begin{matrix} y^{(1)}\\ y^{(2)}\\ \vdots\\ y^{(m)} \end{matrix} \right] X=(x(1))T(x(2))T(x(m))T,θ=θ0θ1θn,y=y(1)y(2)y(m)
    则我们可以知道:
    h θ ( x ( i ) ) = ( x ( i ) ) T θ h_{θ}(x^{(i)}{})=(x^{(i)})^Tθ hθ(x(i))=(x(i))Tθ
    所以:
    X θ − y = [ ( x ( 1 ) ) T θ − y ( 1 ) ( x ( 2 ) ) T θ − y ( 2 ) ⋮ ( x ( m ) ) T θ − y ( m ) ] Xθ-y=\left[ \begin{matrix} (x^{(1)})^Tθ-y^{(1)}\\ (x^{(2)})^Tθ-y^{(2)}\\ \vdots\\ (x^{(m)})^Tθ-y^{(m)} \end{matrix} \right] Xθy=(x(1))Tθy(1)(x(2))Tθy(2)(x(m))Tθy(m)
    因此我们可以推出:
    1 2 ( X θ − y ) T ( X θ − y ) = 1 2 ∑ i = 1 m ( θ T x ( i ) − y ( i ) ) ) 2 = J ( θ ) \frac{1}{2}(Xθ-y)^T(Xθ-y)=\frac{1}{2}\sum^m_{i=1}(θ^Tx^{(i)}-y^{(i)}))^2=J(θ) 21(Xθy)T(Xθy)=21i=1m(θTx(i)y(i)))2=J(θ)
    w = ( X θ − y ) w=(Xθ-y) w=(Xθy),则原式可以写成: J ( θ ) = 1 2 w T w J(θ)=\frac{1}{2}w^Tw J(θ)=21wTw
    d ( J ( θ ) ) = 1 2 d ( w T ) w + 1 2 w T d ( w ) d(J(θ))=\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w) d(J(θ))=21d(wT)w+21wTd(w)
    由于 J ( θ ) J(θ) J(θ)是标量,所以:
    t r ( J ( θ ) ) = t r ( 1 2 d ( w T ) w + 1 2 w T d ( w ) ) = J ( θ ) = t r ( ( ∂ J ( θ ) ∂ w ) T d ( w ) ) tr(J(θ))=tr(\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w))=J(θ)=tr((\frac{\partial J(θ)}{\partial w})^Td(w)) tr(J(θ))=tr(21d(wT)w+21wTd(w))=J(θ)=tr((wJ(θ))Td(w))
    又因为:
    t r ( 1 2 d ( w T ) w + 1 2 w T d ( w ) ) = t r ( 1 2 ( d ( w ) ) T w ) + t r ( 1 2 w T d ( w ) ) = 1 2 t r ( ( w T d ( w ) ) T ) + 1 2 t r ( w T d ( w ) ) = 1 2 t r ( ( w T d ( w ) ) + 1 2 t r ( w T d ( w ) ) = t r ( w T d ( w ) ) = t r ( ( ∂ J ( θ ) ∂ w ) T d ( w ) ) \begin{aligned} tr\left(\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w)\right)&=tr\left(\frac{1}{2}(d(w))^Tw\right)+tr\left(\frac{1}{2}w^Td(w)\right)\\ &=\frac{1}{2}tr\left((w^Td(w))^T\right)+\frac{1}{2}tr\left(w^Td(w)\right)\\ &=\frac{1}{2}tr\left((w^Td(w)\right)+\frac{1}{2}tr\left(w^Td(w)\right)\\ &=tr(w^Td(w))=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\right) \\ \end{aligned} tr(21d(wT)w+21wTd(w))=tr(21(d(w))Tw)+tr(21wTd(w))=21tr((wTd(w))T)+21tr(wTd(w))=21tr((wTd(w))+21tr(wTd(w))=tr(wTd(w))=tr((wJ(θ))Td(w))
    于是我们可以得出:
    w T d ( w ) = ( ∂ J ( θ ) ∂ w ) T d ( w ) w^Td(w)=\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\\ wTd(w)=(wJ(θ))Td(w)
    所以:
    ∂ J ( θ ) ∂ w = w \frac{\partial J(θ)}{\partial w}=w wJ(θ)=w
    由因为:
    d ( w ) = X d ( θ ) d(w)=Xd(θ) d(w)=Xd(θ)
    所以
    d ( J ( θ ) ) = t r ( ( ∂ J ( θ ) ∂ w ) T d ( w ) ) = t r ( ( ∂ J ( θ ) ∂ w ) T X d ( θ ) ) = t r ( w T X d ( θ ) ) d ( J ( θ ) ) = t r ( ( ∂ J ( θ ) ∂ θ ) T d ( θ ) ) \begin{aligned} d(J(θ))&=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\right)=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^TXd(θ)\right)=tr\left(w^TXd(θ)\right)\\ d(J(θ))&=tr\left(\left(\frac{\partial J(θ)}{\partial θ}\right)^Td(θ)\right) \end{aligned} d(J(θ))d(J(θ))=tr((wJ(θ))Td(w))=tr((wJ(θ))TXd(θ))=tr(wTXd(θ))=tr((θJ(θ))Td(θ))
    于是显然有:
    w T X d ( θ ) = ( ∂ J ( θ ) ∂ θ ) T d ( θ ) w^TXd(θ)=\left(\frac{\partial J(θ)}{\partial θ}\right)^Td(θ) wTXd(θ)=(θJ(θ))Td(θ)
    所以:
    ∂ J ( θ ) ∂ θ = X T ( X θ − y ) \frac{\partial J(θ)}{\partial θ}=X^T(Xθ-y) θJ(θ)=XT(Xθy)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值