P(y(i)=k|x(i);θ)=exp(θ(k)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))
似然函数
L=∏i=1M∏k=1KP(y(i)=k|x(i);θ)1{y(i)=k}
对数损失函数为:
J(θ)=−⎡⎣∑i=1m∑k=1K1{y(i)=k}logexp(θ(k)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))⎤⎦
1{⋅} is the ”‘indicator function,”’ so that 1{a true statement}=1, and 1{a false statement}=0.
现在对对数损失函数求偏导
∇θ(n)J(θ)=∑i=1my(i)∂P(y(i)=n|xi;θ)∂θ(n)+∑k=1,k≠nKy(i)∂P(y(i)=k|xi;θ)∂θ(n)
其中,
P(y(i)=n|xi;θ)=logexp(θ(n)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))
P(y(i)=k|xi;θ)=logexp(θ(k)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))
∂P(y(i)=n|xi;θ)∂θ(n)=∑Kj=1exp(θ(j)⊤x(i))exp(θ(n)⊤x(i))∗⎛⎝⎜⎜exp(θ(n)⊤x(i))∗x(i)∑Kj=1exp(θ(j)⊤x(i))−exp(θ(n)⊤x(i))∗exp(θ(n)⊤x(i))x(i)[∑Kj=1exp(θ(j)⊤x(i))]2⎞⎠⎟⎟=x(i)−exp(θ(n)⊤x(i))x(i)∑Kj=1exp(θ(j)⊤x(i))=x(i)(1−P(y(i)=n|xi;θ))
另外一个,
∂P(y(i)=k|xi;θ)∂θ(n)=∑Kj=1exp(θ(j)⊤x(i))exp(θ(k)⊤x(i))⎛⎝⎜⎜−exp(θ(k)⊤x(i))∗exp(θ(n)⊤x(i))x(i)[∑Kj=1exp(θ(j)⊤x(i))]2⎞⎠⎟⎟=−exp(θ(n)⊤x(i))x(i)∑Kj=1exp(θ(j)⊤x(i))=−P(y(i)=n|xi;θ)x(i)
∇θ(k)J(θ)=−∑i=1m[x(i)(1{y(i)=k}−P(y(i)=k|x(i);θ))]