# 2 什么是逻辑回归

ϕ ( z ) = { 0 i f   z < 0 0.5 i f   z = 0 1 i f   z > 0 \phi (z) = \left\{ \begin{aligned} 0 \quad if \ z < 0 \\ 0.5 \quad if \ z=0 \\ 1 \quad if \ z>0 \end{aligned} \right.

ϕ ( z ) = 1 1 + e − z \phi (z) = \dfrac{1}{1 + e^{-z}}

y ^ = { 1 i f   ϕ ( z ) ≥ 0.5 0   o t h e r w i s e \hat{y} = \left\{ \begin{aligned} 1 \quad if \ \phi (z) \geq 0.5 \\ 0 \quad \quad \ otherwise \end{aligned} \right.

# 3 逻辑回归的代价函数

J ( w ) = ∑ i 1 2 ( ϕ ( z ( i ) ) − y ( i ) ) 2 J(w) = \sum_{i} \dfrac{1}{2} (\phi(z^{(i)}) - y^{(i)})^2

p ( y = 1 ∣ x ; w ) = ϕ ( w T x + b ) = ϕ ( z ) p(y=1|x;w) = \phi(w^Tx + b)=\phi(z)

p ( y = 0 ∣ x ; w ) = 1 − ϕ ( z ) p(y=0|x;w) = 1 - \phi(z)

p ( y ∣ x ; w ) = ϕ ( z ) y ( 1 − ϕ ( z ) ) ( 1 − y ) p(y|x;w)=\phi(z)^{y}(1 - \phi(z))^{(1-y)}

L ( w ) = ∏ i = 1 n p ( y ( i ) ∣ x ( i ) ; w ) = ∏ i = 1 n ( ϕ ( z ( i ) ) ) y ( i ) ( 1 − ϕ ( z ( i ) ) ) 1 − y ( i ) L(w)=\prod_{i=1}^{n}p(y^{(i)}|x^{(i)};w)=\prod_{i=1}^{n}(\phi(z^{(i)}))^{y^{(i)}}(1-\phi(z^{(i)}))^{1-y^{(i)}}

l ( w ) = l n L ( w ) = ∑ i = 1 n y ( i ) l n ( ϕ ( z ( i ) ) ) + ( 1 − y ( i ) ) l n ( 1 − ϕ ( z ( i ) ) ) l(w)=lnL(w)=\sum_{i = 1}^n y^{(i)}ln(\phi(z^{(i)})) + (1 - y^{(i)})ln(1-\phi(z^{(i)}))

J ( w ) = − l ( w ) = − ∑ i = 1 n y ( i ) l n ( ϕ ( z ( i ) ) ) + ( 1 − y ( i ) ) l n ( 1 − ϕ ( z ( i ) ) ) J(w)=-l(w)=-\sum_{i = 1}^n y^{(i)}ln(\phi(z^{(i)})) + (1 - y^{(i)})ln(1-\phi(z^{(i)}))

J ( ϕ ( z ) , y ; w ) = − y l n ( ϕ ( z ) ) − ( 1 − y ) l n ( 1 − ϕ ( z ) ) J(\phi(z),y;w)=-yln(\phi(z))-(1-y)ln(1-\phi(z))

J ( ϕ ( z ) , y ; w ) = { − l n ( ϕ ( z ) ) i f   y = 1 − l n ( 1 − ϕ ( z ) ) i f   y = 0 J(\phi(z),y;w)=\begin{cases} -ln(\phi(z)) & if \ y=1 \\ -ln(1-\phi(z)) & if \ y=0 \end{cases}

# 4 利用梯度下降法求参数

ϕ ′ ( z ) = ϕ ( z ) ( 1 − ϕ ( z ) ) \phi'(z) = \phi(z)(1 - \phi(z))

f ( x + δ ) − f ( x ) ≈ f ′ ( x ) ⋅ δ f(x + \delta) - f(x) \approx f'(x) \cdot \delta

f ′ ( x ) ⋅ δ = ∣ ∣ f ′ ( x ) ∣ ∣ ⋅ ∣ ∣ δ ∣ ∣ ⋅ c o s θ f'(x) \cdot \delta = ||f'(x)|| \cdot ||\delta|| \cdot cos \theta

θ = π \theta=\pi 时，也就是 δ \delta f ′ ( x ) f'(x) 的负方向上时，取得最小值，也就是下降的最快的方向了~

okay？好，坐稳了，我们要开始下降了。

w : = w + Δ w ,   Δ w = − η ∇ J ( w ) w := w + \Delta w, \ \Delta w=-\eta \nabla J(w)

w j : = w j + Δ w j ,   Δ w j = − η ∂ J ( w ) ∂ w j w_j := w_j + \Delta w_j,\ \Delta w_j = -\eta \dfrac{\partial J(w)}{\partial w_j}

∂ J ( w ) w j = − ∑ i = 1 n ( y ( i ) 1 ϕ ( z ( i ) ) − ( 1 − y ( i ) ) 1 1 − ϕ ( z ( i ) ) ) ∂ ϕ ( z ( i ) ) ∂ w j = − ∑ i = 1 n ( y ( i ) 1 ϕ ( z ( i ) ) − ( 1 − y ( i ) ) 1 1 − ϕ ( z ( i ) ) ) ϕ ( z ( i ) ) ( 1 − ϕ ( z ( i ) ) ) ∂ z ( i ) ∂ w j = − ∑ i = 1 n ( y ( i ) ( 1 − ϕ ( z ( i ) ) ) − ( 1 − y ( i ) ) ϕ ( z ( i ) ) ) x j ( i ) = − ∑ i = 1 n ( y ( i ) − ϕ ( z ( i ) ) ) x j ( i ) \begin{aligned} & \dfrac{\partial J(w)}{w_j} = -\sum_{i=1}^n (y^{(i)}\dfrac{1}{\phi(z^{(i)})}-(1 - y^{(i)})\dfrac{1}{1-\phi(z^{(i)})})\dfrac{\partial \phi(z^{(i)})}{\partial w_j} \\ & =-\sum_{i=1}^n (y^{(i)}\dfrac{1}{\phi(z^{(i)})}-(1 - y^{(i)})\dfrac{1}{1-\phi(z^{(i)})})\phi(z^{(i)})(1-\phi(z^{(i)}))\dfrac{\partial z^{(i)}}{\partial w_j} \\ & =-\sum_{i=1}^n (y^{(i)}(1-\phi(z^{(i)}))-(1-y^{(i)})\phi(z^{(i)}))x_j^{(i)} \\ & =-\sum_{i=1}^n (y^{(i)}-\phi(z^{(i)}))x_j^{(i)} \end{aligned}

w j : = w j + η ∑ i = 1 n ( y ( i ) − ϕ ( z ( i ) ) ) x j ( i ) w_j :=w_j+\eta \sum_{i=1}^n (y^{(i)}-\phi(z^{(i)}))x_j^{(i)}

w j : = w j + η ( y ( i ) − ϕ ( z ( i ) ) ) x j ( i ) , f o r   i   i n   r a n g e ( n ) w_j := w_j + \eta (y^{(i)}-\phi(z^{(i)}))x_j^{(i)}, for \ i \ in \ range(n)

# 6 参考文献

[1] Raschka S. Python Machine Learning[M]. Packt Publishing, 2015.
[2] 周志华. 机器学习 : = Machine learning[M]. 清华大学出版社, 2016.

11-10 22万+

05-09 7721
08-01 2万+
03-18 5402
12-23 985
02-25 3372
01-10 4万+
03-15 3万+
10-23 3536
03-26 3006
10-30 7831
01-27 279