逻辑回归的梯度下降公式
逻辑回归的代价函数公式如下:
J ( θ ) = − 1 m [ ∑ i = 1 m y ( i ) log h θ ( x ( i ) ) + ( 1 − y ( i ) ) log ( 1 − h θ ( x ( i ) ) ) ] J(\theta)=-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right] J(θ)=−m1[i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]
其梯度下降公式如下:
θ j : = θ j − α ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_{j}:=\theta_{j}-\alpha \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} θj:=θj−αi=1∑m(hθ(x(i))−y(i))xj(i)
详细推导过程
公式回顾
Sigmoid公式及其求导(详细推导过程):
g ( x ) = 1 1 + e − x g(x)=\frac{1}{1+e^{-x}} g(x)=1+e−x1
g ′ ( x ) = g ( x ) ( 1 − g ( x ) ) g^{\prime}(x)=g(x)(1-g(x)) g′(x)=g(x)(1−g(x))
推导过程如下:
∂ J ( θ ) ∂ θ J = − 1 m ∑ i = 1 m ( y ( i ) 1 h θ ( x ( i ) ) ∂ h θ ( x i ) ∂ θ j − ( 1 − y ( i ) ) 1 1 − h θ ( x ( i ) ) ∂ h θ ( x i ) ∂ θ j ) = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) ⋅ ∂ g ( θ T x ( i ) ) ∂ θ j = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) ⋅ g ( θ T x ( i ) ) ( 1 − g ( θ T x ( i ) ) ) x j ( i ) = − 1 m ∑ i = 1 m ( y ( i ) ( 1 − g ( θ T x ( i ) ) − ( 1 − y ( i ) ) g ( θ T x ( i ) ) ) ⋅ x j ( i ) = − 1 m ∑ i = 1 m ( y ( i ) − g ( θ T x ( i ) ) ) ⋅ x j ( i ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x j ( i ) \begin{aligned} \frac{\partial J(\theta)}{\partial \theta_{J}} &=-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)} \frac{1}{h_{\theta}\left(x^{(i)}\right)} \frac{\partial h_{\theta}\left(x^{i}\right)}{\partial \theta_{j}}-\left(1-y^{(i)}\right) \frac{1}{1-h_{\theta}\left(x^{(i)}\right)} \frac{\partial h_{\theta}\left(x^{i}\right)}{\partial \theta_{j}}\right) \\\\ &= -\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)} \frac{1}{g\left(\theta^{T} x^{(i)}\right)}-\left(1-y^{(i)}\right) \frac{1}{1-g\left(\theta^{T} x^{(i)}\right)}\right) \cdot \frac{\partial g\left(\theta^{T} x^{(i)}\right)}{\partial \theta_{j}}\\\\ &=-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)} \frac{1}{g\left(\theta^{T} x^{(i)}\right)}-\left(1-y^{(i)}\right) \frac{1}{1-g\left(\theta^{T} x^{(i)}\right)}\right) \cdot g\left(\theta^{T} x^{(i)}\right)\left(1-g\left(\theta^{T} x^{(i)}\right)\right)x_{j}^{(i)} \\\\ & =-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)}\left(1-g\left(\theta^{T} x^{(i)}\right)-\left(1-y^{(i)}\right) g\left(\theta^{T} x^{(i)}\right)\right) \cdot x_{j}^{(i)}\right. \\\\ &=-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)}-g\left(\theta^{T} x^{(i)}\right)\right) \cdot x_{j}^{(i)} \\\\ &=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \cdot x_{j}^{(i)} \end{aligned} ∂θJ∂J(θ)=−m1i=1∑m(y(i)hθ(x(i))1∂θj∂hθ(xi)−(1−y(i))1−hθ(x(i))1∂θj∂hθ(xi))=−m1i=1∑m(y(i)g(θTx(i))1−(1−y(i))1−g(θTx(i))1)⋅∂θj∂g(θTx(i))=−m1i=1∑m(y(i)g(θTx(i))1−(1−y(i))1−g(θTx(i))1)⋅g(θTx(i))(1−g(θTx(i)))xj(i)=−m1i=1∑m(y(i)(1−g(θTx(i))−(1−y(i))g(θTx(i)))⋅xj(i)=−m1i=1∑m(y(i)−g(θTx(i)))⋅xj(i)=m1i=1∑m(hθ(x(i))−y(i))⋅xj(i)
参考资料
考研必备数学公式大全: https://blog.csdn.net/zhaohongfei_358/article/details/106039576
Sigmoid函数求导过程:https://blog.csdn.net/zhaohongfei_358/article/details/119274445