Cross-Entropy Loss

Reference:

https://en.wikipedia.org/wiki/Cross_entropy

https://d2l.ai/chapter_linear-networks/softmax-regression.html#loss-function

Definition: Cross-Entropy

The cross-entropy of the distribution q q q relative to a distribution p p p over a given set is defined as follows:
H ( p , q ) = − E p [ log ⁡ q ] (1) H(p,q)=-E_p[\log q] \tag{1} H(p,q)=Ep[logq](1)
where E p [ ⋅ ] E_p[\cdot] Ep[] is the expected value operator with respect to the distribution p p p.

For discrete probability distribution p p p and q q q with the same support X \mathcal X X this means:
H ( p , q ) = − ∑ x ∈ X p ( x ) log ⁡ q ( x ) (2) H(p,q)=-\sum_{x\in \mathcal X}p(x)\log q(x)\tag{2} H(p,q)=xXp(x)logq(x)(2)
The situation for continuous distributions is analogous:
H ( p , q ) = − ∫ X P ( x ) log ⁡ Q ( x ) d r ( x ) (3) H(p,q)=-\int _\mathcal X P(x)\log Q(x)dr(x)\tag{3} H(p,q)=XP(x)logQ(x)dr(x)(3)
N.B: The notation H ( p , q ) H(p,q) H(p,q) is also used for the joint entropy of p p p and q q q.

Relation to Log-likelihood

In classification problems we want to estimate the probability of different outcomes. Suppose that the entire dataset { X , y } \{\mathbf X, \mathbf y\} { X,y}​​​​​​ has N N N​​​​​​​ samples, where the sample indexed by i i i​​​​​ consists of a feature vector x ( i ) \mathbf x^{(i)} x(i)​​​​​ and a label y ( i ) y^{(i)} y(i)​​​​​​. Let the estimated probability of outcome k ∈ K k\in \mathcal K kK​​​ be p ^ ( y = k ∣ x ; w ) \hat p(y=k|\mathbf x;\mathbf w) p^(y=kx;w)​​​ and let the frequency (empirical probability) of outcome k k k​​​ in the training set be q ( y = k ∣ x ) q(y=k|\mathbf x) q(y=kx)​​​. The likelihood of the parameters w \mathbf w w​​​​​ is
L ( w ) = ∏ k ∈ K ( est. prob. of  k ) num. of oc

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值