1. 线性回归
成本函数:最小二乘
J
(
θ
)
=
1
2
∑
i
=
1
m
(
θ
T
x
(
i
)
−
y
(
i
)
)
)
2
J(θ)=\frac{1}{2}\sum^m_{i=1}(θ^Tx^{(i)}-y^{(i)}))^2
J(θ)=21i=1∑m(θTx(i)−y(i)))2
利用梯度下降法:
θ
j
=
θ
j
−
α
∂
J
(
θ
)
∂
θ
j
θ_j=θ_j-α\frac{\partial J(θ)}{\partial θ_j}
θj=θj−α∂θj∂J(θ)
则我们需要求导:
- 直接对元素求导
∂ J ( θ ) ∂ θ j = ∑ i = 1 m ∂ ∂ θ j 1 2 ( θ T x ( i ) − y ( i ) ) ) 2 = ∑ i = 1 m { ( θ T x ( i ) − y ( i ) ) ∂ ∂ θ j ( θ T x ( i ) − y ( i ) ) } = ∑ i = 1 m { ( θ T x ( i ) − y ( i ) ) x j ( i ) } \begin{aligned} \frac{\partial J(θ)}{\partial θ_j} &=\sum^m_{i=1}\frac{\partial }{\partial θ_j}\frac{1}{2}(θ^Tx^{(i)}-y^{(i)}))^2\\ &=\sum^m_{i=1}\{(θ^Tx^{(i)}-y^{(i)})\frac{\partial }{\partial θ_j}(θ^Tx^{(i)}-y^{(i)})\}\\ &=\sum^m_{i=1}\{(θ^Tx^{(i)}-y^{(i)})x^{(i)}_j\} \end{aligned} ∂θj∂J(θ)=i=1∑m∂θj∂21(θTx(i)−y(i)))2=i=1∑m{(θTx(i)−y(i))∂θj∂(θTx(i)−y(i))}=i=1∑m{(θTx(i)−y(i))xj(i)}
-
转化为矩阵求导:
令:
X = [ — ( x ( 1 ) ) T — — ( x ( 2 ) ) T — ⋮ — ( x ( m ) ) T — ] , θ = [ θ 0 θ 1 ⋮ θ n ] , y = [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] X=\left[ \begin{matrix} —(x^{(1)})^T—\\ —(x^{(2)})^T—\\ \vdots\\ —(x^{(m)})^T— \end{matrix} \right] ,θ=\left[ \begin{matrix} θ_0\\ θ_1\\ \vdots\\ θ_n \end{matrix} \right], y=\left[ \begin{matrix} y^{(1)}\\ y^{(2)}\\ \vdots\\ y^{(m)} \end{matrix} \right] X=⎣⎢⎢⎢⎡—(x(1))T——(x(2))T—⋮—(x(m))T—⎦⎥⎥⎥⎤,θ=⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤,y=⎣⎢⎢⎢⎡y(1)y(2)⋮y(m)⎦⎥⎥⎥⎤
则我们可以知道:
h θ ( x ( i ) ) = ( x ( i ) ) T θ h_{θ}(x^{(i)}{})=(x^{(i)})^Tθ hθ(x(i))=(x(i))Tθ
所以:
X θ − y = [ ( x ( 1 ) ) T θ − y ( 1 ) ( x ( 2 ) ) T θ − y ( 2 ) ⋮ ( x ( m ) ) T θ − y ( m ) ] Xθ-y=\left[ \begin{matrix} (x^{(1)})^Tθ-y^{(1)}\\ (x^{(2)})^Tθ-y^{(2)}\\ \vdots\\ (x^{(m)})^Tθ-y^{(m)} \end{matrix} \right] Xθ−y=⎣⎢⎢⎢⎡(x(1))Tθ−y(1)(x(2))Tθ−y(2)⋮(x(m))Tθ−y(m)⎦⎥⎥⎥⎤
因此我们可以推出:
1 2 ( X θ − y ) T ( X θ − y ) = 1 2 ∑ i = 1 m ( θ T x ( i ) − y ( i ) ) ) 2 = J ( θ ) \frac{1}{2}(Xθ-y)^T(Xθ-y)=\frac{1}{2}\sum^m_{i=1}(θ^Tx^{(i)}-y^{(i)}))^2=J(θ) 21(Xθ−y)T(Xθ−y)=21i=1∑m(θTx(i)−y(i)))2=J(θ)
令 w = ( X θ − y ) w=(Xθ-y) w=(Xθ−y),则原式可以写成: J ( θ ) = 1 2 w T w J(θ)=\frac{1}{2}w^Tw J(θ)=21wTw
d ( J ( θ ) ) = 1 2 d ( w T ) w + 1 2 w T d ( w ) d(J(θ))=\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w) d(J(θ))=21d(wT)w+21wTd(w)
由于 J ( θ ) J(θ) J(θ)是标量,所以:
t r ( J ( θ ) ) = t r ( 1 2 d ( w T ) w + 1 2 w T d ( w ) ) = J ( θ ) = t r ( ( ∂ J ( θ ) ∂ w ) T d ( w ) ) tr(J(θ))=tr(\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w))=J(θ)=tr((\frac{\partial J(θ)}{\partial w})^Td(w)) tr(J(θ))=tr(21d(wT)w+21wTd(w))=J(θ)=tr((∂w∂J(θ))Td(w))
又因为:
t r ( 1 2 d ( w T ) w + 1 2 w T d ( w ) ) = t r ( 1 2 ( d ( w ) ) T w ) + t r ( 1 2 w T d ( w ) ) = 1 2 t r ( ( w T d ( w ) ) T ) + 1 2 t r ( w T d ( w ) ) = 1 2 t r ( ( w T d ( w ) ) + 1 2 t r ( w T d ( w ) ) = t r ( w T d ( w ) ) = t r ( ( ∂ J ( θ ) ∂ w ) T d ( w ) ) \begin{aligned} tr\left(\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w)\right)&=tr\left(\frac{1}{2}(d(w))^Tw\right)+tr\left(\frac{1}{2}w^Td(w)\right)\\ &=\frac{1}{2}tr\left((w^Td(w))^T\right)+\frac{1}{2}tr\left(w^Td(w)\right)\\ &=\frac{1}{2}tr\left((w^Td(w)\right)+\frac{1}{2}tr\left(w^Td(w)\right)\\ &=tr(w^Td(w))=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\right) \\ \end{aligned} tr(21d(wT)w+21wTd(w))=tr(21(d(w))Tw)+tr(21wTd(w))=21tr((wTd(w))T)+21tr(wTd(w))=21tr((wTd(w))+21tr(wTd(w))=tr(wTd(w))=tr((∂w∂J(θ))Td(w))
于是我们可以得出:
w T d ( w ) = ( ∂ J ( θ ) ∂ w ) T d ( w ) w^Td(w)=\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\\ wTd(w)=(∂w∂J(θ))Td(w)
所以:
∂ J ( θ ) ∂ w = w \frac{\partial J(θ)}{\partial w}=w ∂w∂J(θ)=w
由因为:
d ( w ) = X d ( θ ) d(w)=Xd(θ) d(w)=Xd(θ)
所以
d ( J ( θ ) ) = t r ( ( ∂ J ( θ ) ∂ w ) T d ( w ) ) = t r ( ( ∂ J ( θ ) ∂ w ) T X d ( θ ) ) = t r ( w T X d ( θ ) ) d ( J ( θ ) ) = t r ( ( ∂ J ( θ ) ∂ θ ) T d ( θ ) ) \begin{aligned} d(J(θ))&=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\right)=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^TXd(θ)\right)=tr\left(w^TXd(θ)\right)\\ d(J(θ))&=tr\left(\left(\frac{\partial J(θ)}{\partial θ}\right)^Td(θ)\right) \end{aligned} d(J(θ))d(J(θ))=tr((∂w∂J(θ))Td(w))=tr((∂w∂J(θ))TXd(θ))=tr(wTXd(θ))=tr((∂θ∂J(θ))Td(θ))
于是显然有:
w T X d ( θ ) = ( ∂ J ( θ ) ∂ θ ) T d ( θ ) w^TXd(θ)=\left(\frac{\partial J(θ)}{\partial θ}\right)^Td(θ) wTXd(θ)=(∂θ∂J(θ))Td(θ)
所以:
∂ J ( θ ) ∂ θ = X T ( X θ − y ) \frac{\partial J(θ)}{\partial θ}=X^T(Xθ-y) ∂θ∂J(θ)=XT(Xθ−y)