二分类的激活函数通常选择sigmoid或者softmax函数,对于二分类的问题,sigmoid函数和softmax函数可以等价
首先记隐层的输出为h。
如果使用sigmoid的话,输出层有一个结点,值为
θ
h
\theta h
θh。然后类别1,2的预测概率分别为:
p
1
=
s
i
g
m
o
i
d
(
θ
h
)
=
1
1
+
e
−
θ
h
p_{1} = sigmoid(\theta h) = \frac{1}{1+e^{-\theta h}}
p1=sigmoid(θh)=1+e−θh1
p
2
=
1
−
s
i
g
m
o
i
d
(
θ
h
)
=
1
1
+
e
θ
h
p_{2} = 1-sigmoid(\theta h) = \frac{1}{1+e^{\theta h}}
p2=1−sigmoid(θh)=1+eθh1
这里面的
θ
\theta
θ相等于softmax层之中的一个权重参数。
而如果使用softmax,输出层有两个结点,值分别是
θ
1
h
\theta_{1}h
θ1h与
θ
2
h
\theta_{2}h
θ2h,然后类别1,2的预测概率分别为
p
1
=
e
θ
1
h
e
θ
1
h
+
e
θ
2
h
=
1
1
+
e
(
θ
2
−
θ
1
)
h
p_{1} = \frac{e^{\theta_{1}h}}{e^{\theta_{1}h}+e^{\theta_{2}h}} = \frac{1}{1+e^{(\theta_{2}-\theta_{1})h}}
p1=eθ1h+eθ2heθ1h=1+e(θ2−θ1)h1
p
2
=
e
θ
2
h
e
θ
1
h
+
e
θ
2
h
=
1
1
+
e
(
θ
1
−
θ
2
)
h
p_{2} = \frac{e^{\theta_{2}h}}{e^{\theta_{1}h}+e^{\theta_{2}h}} = \frac{1}{1+e^{(\theta_{1}-\theta_{2})h}}
p2=eθ1h+eθ2heθ2h=1+e(θ1−θ2)h1
这里面的
θ
1
\theta_{1}
θ1和
θ
2
\theta_{2}
θ2相当于softmax网络之中的两个权重参数
可以看到,sigmoid网络中的
θ
\theta
θ与softmax网络中的
(
θ
1
−
θ
2
)
(\theta_{1}-\theta_{2})
(θ1−θ2)是等价的。也就是说,不管sigmoid网络能产生什么样的预测,也一定存在softmax网络能产生相同的预测,只要令
θ
1
−
θ
2
=
θ
\theta_{1}-\theta_{2} = \theta
θ1−θ2=θ即可。
所以softmax网络的训练过程可以看作是在直接优化
θ
1
−
θ
2
\theta_{1}-\theta_{2}
θ1−θ2,优化结果和sigmoid应该没什么差异。所以我自己在做的时候会直接用softmax,这样也比较方便改成多分类模型。
另外这里在实战的时候,对于如下矩阵
tensor([[ 0.0056],
[-0.0120],
[-0.0119],
[-0.0058],
[-0.0110],
[-0.0191],
[-0.0165],
[-0.0230],
[-0.0024],
[ 0.0033],
[-0.0204],
[ 0.0007],
[-0.0144],
[-0.0303],
[-0.0115],
[-0.0089],
[-0.0129],
[-0.0128],
[-0.0011],
[-0.0178],
[-0.0031],
[-0.0119],
[-0.0035],
[ 0.0074],
[-0.0165],
[-0.0058],
[-0.0393],
[ 0.0149],
[-0.0209],
[-0.0335],
[-0.0154],
[-0.0019],
[-0.0015],
[-0.0252],
[-0.0104],
[-0.0215],
[-0.0116],
[-0.0048],
[-0.0143],
[ 0.0003],
[-0.0025],
[-0.0292],
[ 0.0009],
[-0.0217],
[-0.0207],
[-0.0073],
[-0.0280],
[-0.0219],
[-0.0233],
[-0.0145],
[-0.0391],
[-0.0103],
[-0.0184],
[ 0.0005],
[-0.0251],
[-0.0156],
[-0.0254],
[-0.0123],
[-0.0313],
[-0.0188],
[-0.0318],
[-0.0167],
[-0.0001],
[-0.0005],
[-0.0105],
[-0.0266],
[-0.0133],
[-0.0164],
[-0.0216],
[-0.0181],
[-0.0036],
[-0.0052],
[-0.0310],
[-0.0131],
[-0.0067],
[-0.0049],
[-0.0141],
[-0.0188],
[-0.0215],
[-0.0438],
[-0.0172],
[-0.0152],
[-0.0290],
[-0.0239],
[ 0.0038],
[-0.0191],
[-0.0283],
[-0.0015],
[-0.0101],
[-0.0054],
[-0.0108],
[-0.0198],
[-0.0089],
[-0.0106],
[-0.0277],
[-0.0057],
[-0.0043],
[-0.0065],
[-0.0308],
[-0.0225],
[-0.0183],
[ 0.0059],
[-0.0261],
[-0.0289],
[-0.0140],
[-0.0283],
[-0.0134],
[-0.0251],
[-0.0115],
[-0.0189],
[-0.0172],
[-0.0098],
[-0.0162],
[-0.0167],
[ 0.0166],
[-0.0070],
[-0.0063],
[-0.0272],
[-0.0140],
[-0.0071],
[ 0.0053],
[-0.0182],
[ 0.0041],
[-0.0163],
[-0.0188],
[-0.0033],
[-0.0310],
[-0.0084]], device='cuda:0', grad_fn=<AddmmBackward>)
这里我使用softmax激活函数,得到的是一个全1的矩阵:
out =
tensor([[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.]], device='cuda:0', grad_fn=<SoftmaxBackward>)
而使用sigmoid激活函数,得到的数组内容如下:
data =
tensor([[0.5014],
[0.4970],
[0.4970],
[0.4985],
[0.4973],
[0.4952],
[0.4959],
[0.4943],
[0.4994],
[0.5008],
[0.4949],
[0.5002],
[0.4964],
[0.4924],
[0.4971],
[0.4978],
[0.4968],
[0.4968],
[0.4997],
[0.4956],
[0.4992],
[0.4970],
[0.4991],
[0.5018],
[0.4959],
[0.4985],
[0.4902],
[0.5037],
[0.4948],
[0.4916],
[0.4962],
[0.4995],
[0.4996],
[0.4937],
[0.4974],
[0.4946],
[0.4971],
[0.4988],
[0.4964],
[0.5001],
[0.4994],
[0.4927],
[0.5002],
[0.4946],
[0.4948],
[0.4982],
[0.4930],
[0.4945],
[0.4942],
[0.4964],
[0.4902],
[0.4974],
[0.4954],
[0.5001],
[0.4937],
[0.4961],
[0.4937],
[0.4969],
[0.4922],
[0.4953],
[0.4921],
[0.4958],
[0.5000],
[0.4999],
[0.4974],
[0.4934],
[0.4967],
[0.4959],
[0.4946],
[0.4955],
[0.4991],
[0.4987],
[0.4923],
[0.4967],
[0.4983],
[0.4988],
[0.4965],
[0.4953],
[0.4946],
[0.4891],
[0.4957],
[0.4962],
[0.4928],
[0.4940],
[0.5009],
[0.4952],
[0.4929],
[0.4996],
[0.4975],
[0.4987],
[0.4973],
[0.4951],
[0.4978],
[0.4974],
[0.4931],
[0.4986],
[0.4989],
[0.4984],
[0.4923],
[0.4944],
[0.4954],
[0.5015],
[0.4935],
[0.4928],
[0.4965],
[0.4929],
[0.4967],
[0.4937],
[0.4971],
[0.4953],
[0.4957],
[0.4976],
[0.4960],
[0.4958],
[0.5041],
[0.4983],
[0.4984],
[0.4932],
[0.4965],
[0.4982],
[0.5013],
[0.4955],
[0.5010],
[0.4959],
[0.4953],
[0.4992],
[0.4923],
[0.4979]])
可以看出用softmax与用sigmoid函数得到的分布大致相同,只不过初始化的权重参数不同