交叉熵是否非负?

在Ian Goodfellow那本Deep Learning Book中有这样一段描述:

One unusual property of the cross-entropy cost used to perform maximum likelihood estimation is that it usually does not have a minimum value when applied to the models commonly used in practice. For discrete output variables, most models are parametrized in such a way that they cannot represent a probability of zero or one, but can come arbitrarily close to doing so. Logistic regression is an example of such a model. For real-valued output variables, if the model can control the density of the output distribution (for example, by learning the variance parameter of a Gaussian output distribution) then it becomes possible to assign extremely high density to the correct training set outputs, resulting in cross-entropy approaching negative infinity.
http://www.deeplearningbook.org/contents/mlp.html p.175

第一次看的时候很是迷惑,好像从来没有见过负的交叉熵啊。我们知道交叉熵可以写成熵和KL散度之和:

H(p;q)=H(p)+KL(p||q)

其中 KL(p||q) 是非负的。证明很简单:
KL(p||q)=Ep[log(pq)]=Ep[log(qp)]log(Ep[qp])=log(pqp)=0

中间的大于等于号来自于詹森不等式

对于H(p), 隐约记得熵应该也是非负的:

H(x)=p(x)log(1p(x))

p是概率,必定在[0, 1], log(1p(x)) 肯定大于0, H(p) 就一定大于0了。

这样讲,交叉熵一定是非负了,怎么可能有negative infinity呢?

再仔细读一遍上面出现negative infinity的那句话。

For real-valued output variables, if the model can control the density of the output distribution (for example, by learning the variance parameter of a Gaussian output distribution) then it becomes possible to assign extremely high density to the correct training set outputs, resulting in cross-entropy approaching negative infinity.

原来重点在real-valued output variables. 对于连续随机变量来说,熵的定义要写成积分形式:

H(x)=xp(x)log(1p(x))

这里的p就变成了概率密度,取值范围变成了 [0,+) ,这个积分可就不一定有下界了。极端情况下, p(x) 是个 Dirac delta function, 这个积分就是负无穷了。

因此对于,连续随机变量,熵有可能是负的。

我们再看,交叉熵的定义:

H(p,q)=xp(x)log(1q(x))

我们同样假定q(x)是个Dirac delta function,那么这个交叉熵也就变成了负无穷。

这也是上面那段话后半段的应有之义。

it becomes possible to assign extremely high density to the correct training set outputs, resulting in cross-entropy approaching negative infinity.

因此,我们的结论是:

  • 对于离散随机变量,交叉熵是非负的。如果你的分类问题是softmax + cross_entropy_loss 出现了负的loss,那肯定是算错了。
  • 对于连续随机变量,交叉熵有可能是负。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值