符号定义:
- x t ∈ R d \mathbf x_t\in \mathbb R^d xt∈Rd:d 空间中的一个样本实例,其标签值 y t ∈ { 0 , 1 } y_t\in \{0, 1\} yt∈{0,1}
-
p
t
p_t
pt 为样本
x
t
\mathbf x_t
xt 的预测值,在 logistic regression 环境下,
- p t = σ ( w t ⋅ x t ) p_t=\sigma(\mathbf w_t\cdot \mathbf x_t) pt=σ(wt⋅xt), σ ( a ) = 1 / ( 1 + exp ( − a ) ) \sigma(a)=1/(1+\exp(-a)) σ(a)=1/(1+exp(−a))
- σ ′ ( a ) = σ ( a ) ( 1 − σ ( a ) ) \sigma'(a)=\sigma(a)(1-\sigma(a)) σ′(a)=σ(a)(1−σ(a))
- p t ′ = p t ( 1 − p t ) ⋅ x t p'_t=p_t(1-p_t)\cdot \mathbf x_t pt′=pt(1−pt)⋅xt
- 损失函数为对数损失(LogLoss):
- ℓ t ( w t ) = − y t log p t − ( 1 − y t ) log ( 1 − p t ) \ell_t(\mathbf w_t)=-y_t\log p_t-(1-y_t)\log(1-p_t) ℓt(wt)=−ytlogpt−(1−yt)log(1−pt)
- 因为 y t ∈ { 0 , 1 } y_t\in \{0, 1\} yt∈{0,1},所以 ℓ t ( w t ) = − log p t \ell_t(\mathbf w_t)=-\log p_t ℓt(wt)=−logpt 或 ℓ t ( w t ) = − log ( 1 − p t ) \ell_t(\mathbf w_t)=-\log(1- p_t) ℓt(wt)=−log(1−pt)
1. 计算 ∇ ℓ t ( w ) \nabla\ell_t(\mathbf w) ∇ℓt(w)
∂ ℓ t ( w ) ∂ w = − y t p t p t ′ + 1 − y t 1 − p t p t ′ = ( − y t p t + 1 − y t 1 − p t ) p t ′ = p t − y t p t ( 1 − p t ) ⋅ p t ( 1 − p t ) ⋅ x t = ( p t − y t ) x t \begin{array}{ll} \frac{\partial\ell_t(\mathbf w)}{\partial \mathbf w}&=-\frac{y_t}{p_t}p'_t+\frac{1-y_t}{1-p_t}p'_t\\ &=(-\frac{y_t}{p_t}+\frac{1-y_t}{1-p_t})p'_t&\\ &=\frac{p_t-y_t}{p_t(1-p_t)}\cdot p_t(1-p_t)\cdot \mathbf x_t&\\ &=(p_t-y_t)\mathbf x_t& \end{array} ∂w∂ℓt(w)=−ptytpt′+1−pt1−ytpt′=(−ptyt+1−pt1−yt)pt′=pt(1−pt)pt−yt⋅pt(1−pt)⋅xt=(pt−yt)xt