x = logits表示鉴别器输出特征, z = labels表示对应的标签;则交叉熵表示为
z
∗
−
l
o
g
(
s
i
g
m
o
i
d
(
x
)
)
+
(
1
−
z
)
∗
−
l
o
g
(
1
−
s
i
g
m
o
i
d
(
x
)
)
z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
z∗−log(sigmoid(x))+(1−z)∗−log(1−sigmoid(x)).
推导如下:
z
∗
−
l
o
g
(
s
i
g
m
o
i
d
(
x
)
)
+
(
1
−
z
)
∗
−
l
o
g
(
1
−
s
i
g
m
o
i
d
(
x
)
)
z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
z∗−log(sigmoid(x))+(1−z)∗−log(1−sigmoid(x))
=
z
∗
−
l
o
g
(
1
/
(
1
+
e
−
x
)
)
+
(
1
−
z
)
∗
−
l
o
g
(
e
−
x
/
(
1
+
e
−
x
)
)
= z * -log(1 / (1 +e^{-x})) + (1 - z) * -log(e^{-x} / (1 +e^{-x}))
=z∗−log(1/(1+e−x))+(1−z)∗−log(e−x/(1+e−x))
=
z
∗
l
o
g
(
1
+
e
−
x
)
+
(
1
−
z
)
∗
(
−
l
o
g
(
e
−
x
)
+
l
o
g
(
1
+
e
−
x
)
)
= z * log(1 + e^{-x}) + (1 - z) * (-log(e^{-x}) + log(1 + e^{-x}))
=z∗log(1+e−x)+(1−z)∗(−log(e−x)+log(1+e−x))
=
z
∗
l
o
g
(
1
+
e
−
x
)
+
(
1
−
z
)
∗
(
x
+
l
o
g
(
1
+
e
−
x
)
)
= z * log(1 + e^{-x}) + (1 - z) * (x + log(1 + e^{-x}))
=z∗log(1+e−x)+(1−z)∗(x+log(1+e−x))
=
(
1
−
z
)
∗
x
+
l
o
g
(
1
+
e
−
x
)
= (1 - z) * x + log(1 + e^{-x})
=(1−z)∗x+log(1+e−x)
=
x
−
x
∗
z
+
l
o
g
(
1
+
e
−
x
)
= x - x * z + log(1 + e^{-x})
=x−x∗z+log(1+e−x) 当x<0,可进一步化简为:
=
l
o
g
(
e
x
)
−
x
∗
z
+
l
o
g
(
1
+
e
−
x
)
= log(e^x) - x * z + log(1 + e^{-x})
=log(ex)−x∗z+log(1+e−x)
=
−
x
∗
z
+
l
o
g
(
1
+
e
x
)
= - x * z + log(1 + e^{x})
=−x∗z+log(1+ex)
The logistic loss formula from above is x - x * z + log(1 + exp(-x))
For x < 0, a more numerically stable formula is -x * z + log(1 + exp(x))
Note that these two expressions can be combined into the following:max(x, 0) - x * z + log(1 + exp(-abs(x)))
当z=1时,真实样本对应的损失为:
−
l
o
g
(
s
i
g
m
o
i
d
(
x
)
)
=
l
o
g
(
e
−
x
+
1
)
=
l
o
g
(
e
x
+
1
)
−
x
-log(sigmoid(x))=log(e^{-x}+1)=log(e^x+1)-x
−log(sigmoid(x))=log(e−x+1)=log(ex+1)−x.
当z=0时,生成样本对应的损失为:
−
l
o
g
(
1
−
s
i
g
m
o
i
d
(
x
)
)
=
x
+
l
o
g
(
e
−
x
+
1
)
=
l
o
g
(
e
x
+
1
)
-log(1-sigmoid(x))=x+log(e^{-x}+1)=log(e^x+1)
−log(1−sigmoid(x))=x+log(e−x+1)=log(ex+1).其中
s
o
f
t
p
l
u
s
(
x
)
=
l
o
g
e
(
1
+
e
x
)
softplus(x)=log_e(1+e^x)
softplus(x)=loge(1+ex).
# the first term of discriminator loss of real sample:-log[D(x)]
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_entropy_with_logits(logits=D_real_logits,labels=tf.ones_like(D));# the second term of discriminator loss of fake sample:-log[1-D(G(z))]
d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_entropy_with_logits(logits=D_fake_logits,labels=tf.zeros_like(D));# D_fake_logits是鉴别器对生成器生成样本提取的特征 D(G(z))
d_loss = d_loss_real + d_loss_fake ;# -log[D(G(z))]
g_loss = tf.reduce_mean(tf.nn.sigmoid_entropy_with_logits(logits=D_fake_logits,labels=tf.ones_like(D));