做ex_2的时候,碰到一个求梯度公式,在此手推一波. 如下:
δ J ( θ ) δ θ j = 1 m ∑ i = 1 m ( h θ ( x ) ( i ) − y ( i ) ) x j ( i ) \frac{\delta J(\theta)}{\delta\theta_j}=\frac {1}{m}\sum_{i=1}^m(h_\theta(x)^{(i)}-y^{(i)})x^{(i)}_j δθjδJ(θ)=m1i=1∑m(hθ(x)(i)−y(i))xj(i)
- 前提:cost function
J
(
θ
)
=
1
m
∑
i
=
1
m
[
−
y
(
i
)
l
o
g
(
h
θ
(
x
(
i
)
)
)
−
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
]
J(\theta)=\frac 1m\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)}))-(1-y^{(i)})log(1-h_\theta(x^{(i)}))]
J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]hypothesis
h θ ( x ( i ) ) = g ( x ( i ) θ ) h_\theta(x^{(i)})=g(x{(i)}\theta) hθ(x(i))=g(x(i)θ)Logistic function
g ( x ) = 1 1 + e x p ( − x ) , e x p ( x ) = e x g(x)=\frac 1{1+exp(-x)}, exp(x)=e^x g(x)=1+exp(−x)1,exp(x)=ex - 推导过程:
δ J ( θ ) δ θ j = − 1 m ( y ( i ) h θ ′ ( x ( i ) ) h θ ( x ( i ) ) + ( 1 − y ( i ) ) − h θ ′ ( x ( i ) ) 1 − h θ ( x ( i ) ) ) \frac{\delta J(\theta)}{\delta\theta_j}=-\frac1m(y^{(i)}\frac{h^{'}_\theta(x^{(i)})}{h_\theta(x^{(i)})}+(1-y^{(i)})\frac{-h^{'}_\theta(x^{(i)})}{1-h_\theta(x^{(i)})}) δθjδJ(θ)=−m1(y(i)hθ(x(i))hθ′(x(i))+(1−y(i))1−hθ(x(i))−hθ′(x(i)))
= − 1 m y ( i ) h θ ′ ( x ( i ) ( 1 − h θ ( x ( i ) ) ) − ( 1 − y ( i ) ) h θ ′ ( x ( i ) ) h θ ( x ( i ) ) h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) ) =-\frac1m\frac{y^{(i)}h^{'}_\theta(x^{(i)}(1-h_\theta(x^{(i)}))-(1-y^{(i)}){h^{'}_\theta(x^{(i)})}{h_\theta(x^{(i)})}}{h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))} =−m1hθ(x(i))(1−hθ(x(i)))y(i)hθ′(x(i)(1−hθ(x(i)))−(1−y(i))hθ′(x(i))hθ(x(i))
= − 1 m ( y ( i ) − h θ ( x ( i ) ) ) h θ ′ ( x ( i ) ) h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) ) . . . . . . . . . . . . . . . ( 1 ) =-\frac1m\frac{(y^{(i)}-h_\theta(x^{(i)}))h^{'}_\theta(x^{(i)})}{h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))}...............(1) =−m1hθ(x(i))(1−hθ(x(i)))(y(i)−hθ(x(i)))hθ′(x(i))...............(1)
设 H ( x ) = x ( i ) θ , 则 h θ ′ ( x ( i ) ) = − e H ( x ) ( 1 + e − H ( x ) ) 2 H ′ ( x ) . . . . . . . ( 2 ) 设H(x)=x{(i)}\theta, 则h^{'}_\theta(x^{(i)})=-\frac{e^H(x)}{(1+e^-H(x))^2}H^{'}(x).......(2) 设H(x)=x(i)θ,则hθ′(x(i))=−(1+e−H(x))2eH(x)H′(x).......(2)
h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) ) = e H ( x ) ( 1 + e − H ( x ) ) 2 . . . . . . . . . . ( 3 ) h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))= \frac{e^H(x)}{(1+e^-H(x))^2}..........(3) hθ(x(i))(1−hθ(x(i)))=(1+e−H(x))2eH(x)..........(3)
H ′ ( x ) = θ j ( i ) . . . . . . . . . . . . . . . . ( 4 ) H^{'}(x)=\theta_j^{(i)}................(4) H′(x)=θj(i)................(4)
将 ( 2 ) ( 3 ) ( 4 ) 代 入 ( 1 ) 得 δ J ( θ ) δ θ j = 1 m x j ( i ) ( h θ ( x ) ( i ) − y ( i ) ) x j ( i ) 将(2)(3)(4)代入(1)得\frac{\delta J(\theta)}{\delta\theta_j}=\frac {1}{m}x^{(i)}_j(h_\theta(x)^{(i)}-y^{(i)})x^{(i)}_j 将(2)(3)(4)代入(1)得δθjδJ(θ)=m1xj(i)(hθ(x)(i)−y(i))xj(i)