多分类softmax激活函数 & 二分类sigmoid激活函数
(1)多分类:样本属于第
k
k
k个类别(总共
K
K
K个类别)的概率
S
k
=
e
x
k
∑
i
=
1
K
e
x
i
S_k=\frac{e^{x_k}}{\sum\limits_{i=1}^K e^{x_i}}
Sk=i=1∑Kexiexk
其中
x
k
x_k
xk是样本经过隐层线性组合后的结果。
(2)二分类:样本属于正类1(正类1、负类0)的概率
P
(
Y
=
1
∣
x
)
=
1
1
+
e
−
h
(
x
)
=
1
1
+
e
−
(
θ
T
x
+
b
)
P(Y=1|x)=\frac{1}{1+e^{-h(x)}}=\frac{1}{1+e^{-(\theta^T x + b)}}
P(Y=1∣x)=1+e−h(x)1=1+e−(θTx+b)1
其中
h
(
x
)
h(x)
h(x)是隐层线性组合后的结果,即
θ
\theta
θ是隐层参数,
b
b
b是偏置项。
几个基本函数的用法
(1)验证softmax
import tensorflow as tf
hidden_layer = tf.Variable([1.0,2.0,3.0])
softmax_active_predict = tf.nn.softmax(hidden_layer)
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(softmax_active_predict))
#Output:一个样本属于三个类别的概率
#[0.09003057 0.24472848 0.66524094]
import numpy as np
aa=np.array([1.0,2.0,3.0])
aa_e=np.exp(aa)
fm=sum(aa_e)
print(aa_e, fm)
for i in aa:
print(np.exp(i)/fm)
#Output:
#[ 2.71828183 7.3890561 20.08553692] #30.19287485057736
#0.09003057317038046
#0.24472847105479767
#0.6652409557748219
(2)验证sigmoid
import tensorflow as tf
hidden_layer = tf.Variable([1.0,2.0,3.0])
sigmoid_active_predict = tf.nn.sigmoid(hidden_layer)
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(sigmoid_active_predict))
#Output:三个样本属于正类1的概率
#[0.7310586 0.880797 0.95257413]
import numpy as np
aa=np.array([1.0,2.0,3.0])
aa_e=np.exp(aa)
for i in aa:
print(1/(1+np.exp(-i)))
#Output:
#0.7310585786300049
#0.8807970779778823
#0.9525741268224334
(3)验证tf.nn.sigmoid_cross_entropy_with_logits
该API接口合并了求sigmoid和cross_entropy的过程,参数labels和logits的shape和dtype都要相同
Case1:二分类[正类是1,负类是-1]
l
o
g
l
o
s
s
=
∑
i
−
l
o
g
(
s
i
g
m
o
i
d
(
y
t
r
u
e
(
i
)
y
p
r
e
d
i
c
t
(
i
)
)
)
=
∑
i
−
l
o
g
(
1
1
+
e
−
(
y
t
r
u
e
(
i
)
y
p
r
e
d
i
c
t
(
i
)
)
)
logloss=\sum\limits_{i}-log(sigmoid(y^{(i)}_{true}y^{(i)}_{predict}))=\sum\limits_{i}-log\Big(\frac{1}{1+e^{-(y^{(i)}_{true}y^{(i)}_{predict})}}\Big)
logloss=i∑−log(sigmoid(ytrue(i)ypredict(i)))=i∑−log(1+e−(ytrue(i)ypredict(i))1)
import tensorflow as tf
hidden_layer = tf.Variable([[1.0,2.0,3.0]])
y_true = tf.Variable([[-1.0,1.0,1.0]])
diff = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=hidden_layer)
cross_entropy = tf.reduce_mean(diff)
#注意:此时对应的正类是1,负类是-1
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print('real&pred diff:',sess.run(diff))
print('logloss/cross_entropy:',sess.run(cross_entropy))
import numpy as np
aa=np.array([1.0,2.0,3.0])
label=np.array([-1,1,1])
sig=[]
for i in range(len(aa)):
sig.append(label[i] * -np.log(1/(1+np.exp(-aa[i]))) +
(1-label[i]) * -np.log(1 - 1/(1+np.exp(-aa[i]))))
logloss=np.mean(np.array(sig))
print('sigmoid:',sig)
print('logloss:',logloss)
#real&pred diff: [[2.3132617 0.126928 0.04858735]]
#logloss/cross_entropy: 0.8295924
#sigmoid: [2.3132616875182226, 0.12692801104297263, 0.04858735157374191]
#logloss: 0.8295923500449791
Case2:二分类[正类是1,负类是0]
l
o
g
l
o
s
s
=
−
∑
i
[
y
t
r
u
e
(
i
)
l
o
g
y
p
r
e
d
i
c
t
(
i
)
+
(
1
−
y
t
r
u
e
(
i
)
)
l
o
g
(
1
−
y
p
r
e
d
i
c
t
(
i
)
)
]
logloss=-\sum\limits_{i} \Big[y^{(i)}_{true} log{y^{(i)}_{predict}} + (1-y^{(i)}_{true}) log(1-{y^{(i)}_{predict}})\Big]
logloss=−i∑[ytrue(i)logypredict(i)+(1−ytrue(i))log(1−ypredict(i))]
import tensorflow as tf
hidden_layer = tf.Variable([[1.0,2.0,3.0]])
y_true = tf.Variable([[1.0,0.0,1.0]])
cross_entropy = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=hidden_layer))
#注意:此时对应的正类是1,负类是0
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print('logloss/cross_entropy:',sess.run(cross_entropy))
#Output:
#logloss/cross_entropy: 2.4887772
import numpy as np
h=np.array([1.0,2.0,3.0])
label=np.array([1,0,1])
pred=1/(1+np.exp(-h))
sig=[]
for i in range(len(h)):
sig.append(label[i]*np.log(pred[i])+(1-label[i])*(np.log(1-pred[i])))
print('logloss:',-sum(sig))
#Output:
#[0.73105858 0.88079708 0.95257413]
#logloss: 2.488777050134936
(4)验证tf.nn.softmax_cross_entropy_with_logits
softmax损失函数:[a]样本
i
i
i属于各个类别的输出
a
i
(
k
)
=
e
h
k
∑
k
e
h
k
a^{(k)}_i=\frac{e^{h_k}}{\sum\limits_{k}e^{h_k}}
ai(k)=k∑ehkehk
[b]对所有样本的损失
l
o
g
l
o
s
s
=
−
∑
i
∑
k
y
i
(
k
)
l
o
g
(
a
i
(
k
)
)
logloss=-\sum\limits_{i}\sum\limits_{k} y^{(k)}_i log(a^{(k)}_i)
logloss=−i∑k∑yi(k)log(ai(k))
import tensorflow as tf
hidden_layer = tf.Variable([[1.0,2.0,3.0],[1.0,2.0,3.0],[1.0,2.0,3.0]]) #三个样本属于正类1的隐层计算结果
y_true = tf.Variable([[0.0,0.0,1.0],[0.0,0.0,1.0],[0.0,0.0,1.0]])
cross_entropy = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=hidden_layer))
init_op=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print('logloss/cross_entropy:',sess.run(cross_entropy))
#Output:
#logloss/cross_entropy: 1.2228179
import numpy as np
h=np.array([[1.0,2.0,3.0],[1.0,2.0,3.0],[1.0,2.0,3.0]])
label=np.array([[0.0,0.0,1.0],[0.0,0.0,1.0],[0.0,0.0,1.0]])
soft_max=[]
for i in h:
temp_sum=sum(np.exp(i))
soft_max.append(list(np.exp(i)/temp_sum))
print('softmax_matrix:',soft_max)
pred=np.log(np.array(soft_max))
print(-sum(sum(np.multiply(label,pred)))) #矩阵对应位置相乘
#Output:
#softmax_matrix: [[0.09003057317038046, 0.24472847105479767, 0.6652409557748219],
# [0.09003057317038046, 0.24472847105479767, 0.6652409557748219],
# [0.09003057317038046, 0.24472847105479767, 0.6652409557748219]]
#1.2228178933331408
(5)tf.placeholder(dtype, shape=None, name=None)
dtype:数据类型。shape:数据维度。name:名称。
占位符placeholder没有初始值,tensorflow给它分配必要的内存。在session中,占位符用 feed_dict 传送数据。feed_dict是一个字典,在字典中需要给出每一个用到的占位符的取值。使用大量数据训练神经网络时,训练样本需要按batch传入,如果每次迭代的样本要用常量,那么tensorFlow 每次都会在计算图中增加一个结点,计算图会非常大;使用占位符,可以使多次传入数据都占用同一个结点。