softmax regression 代价函数:
J ( θ ) = − 1 m [ ∑ i = 1 m ∑ j = 1 k 1 { y ( i ) = j } l o g e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ] J(\theta) = -\frac{1}{m}\left[\sum_{i=1}^{m}\sum_{j=1}^{k}1\{y^{(i)}=j\}log \frac{e^{
{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{
{\theta_l^T}{X^{(i)}}}}\right] J(θ)=−m1[i=1∑mj=1∑k1{
y(i)=j}log∑l=1keθlTX(i)eθjTX(i)]
其中,1{y(i)=j}表示的是当y(i)属于类别j时,1{y(i)=j}=1, 否则,1{y(i)=j}=0.
对损失函数求导:
∇ θ j J ( θ ) = − 1 m ∑ i = 1 m [ ∇ θ j ∑ j = 1 k 1 { y ( i ) = j } l o g e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ] = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) e θ j T X ( i ) ⋅ ( − e θ j T X ( i ) ⋅ X ( i ) ⋅ e θ j T X ( i ) ( ∑ l = 1 k e θ l T X ( i ) ) 2 + e θ j T X ( i ) ⋅ X ( i ) ∑ l = 1 k e θ l T X ( i ) ) ] = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) − e θ j T X ( i