二分类问题,应该选择sigmoid还是softmax函数

二分类的激活函数通常选择sigmoid或者softmax函数,对于二分类的问题,sigmoid函数和softmax函数可以等价
首先记隐层的输出为h。
如果使用sigmoid的话,输出层有一个结点,值为 θ h \theta h θh。然后类别1,2的预测概率分别为:
p 1 = s i g m o i d ( θ h ) = 1 1 + e − θ h p_{1} = sigmoid(\theta h) = \frac{1}{1+e^{-\theta h}} p1=sigmoid(θh)=1+eθh1
p 2 = 1 − s i g m o i d ( θ h ) = 1 1 + e θ h p_{2} = 1-sigmoid(\theta h) = \frac{1}{1+e^{\theta h}} p2=1sigmoid(θh)=1+eθh1
这里面的 θ \theta θ相等于softmax层之中的一个权重参数。
而如果使用softmax,输出层有两个结点,值分别是 θ 1 h \theta_{1}h θ1h θ 2 h \theta_{2}h θ2h,然后类别1,2的预测概率分别为
p 1 = e θ 1 h e θ 1 h + e θ 2 h = 1 1 + e ( θ 2 − θ 1 ) h p_{1} = \frac{e^{\theta_{1}h}}{e^{\theta_{1}h}+e^{\theta_{2}h}} = \frac{1}{1+e^{(\theta_{2}-\theta_{1})h}} p1=eθ1h+eθ2heθ1h=1+e(θ2θ1)h1
p 2 = e θ 2 h e θ 1 h + e θ 2 h = 1 1 + e ( θ 1 − θ 2 ) h p_{2} = \frac{e^{\theta_{2}h}}{e^{\theta_{1}h}+e^{\theta_{2}h}} = \frac{1}{1+e^{(\theta_{1}-\theta_{2})h}} p2=eθ1h+eθ2heθ2h=1+e(θ1θ2)h1
这里面的 θ 1 \theta_{1} θ1 θ 2 \theta_{2} θ2相当于softmax网络之中的两个权重参数
可以看到,sigmoid网络中的 θ \theta θ与softmax网络中的 ( θ 1 − θ 2 ) (\theta_{1}-\theta_{2}) (θ1θ2)是等价的。也就是说,不管sigmoid网络能产生什么样的预测,也一定存在softmax网络能产生相同的预测,只要令 θ 1 − θ 2 = θ \theta_{1}-\theta_{2} = \theta θ1θ2=θ即可。
所以softmax网络的训练过程可以看作是在直接优化 θ 1 − θ 2 \theta_{1}-\theta_{2} θ1θ2,优化结果和sigmoid应该没什么差异。所以我自己在做的时候会直接用softmax,这样也比较方便改成多分类模型。
另外这里在实战的时候,对于如下矩阵

tensor([[ 0.0056],
        [-0.0120],
        [-0.0119],
        [-0.0058],
        [-0.0110],
        [-0.0191],
        [-0.0165],
        [-0.0230],
        [-0.0024],
        [ 0.0033],
        [-0.0204],
        [ 0.0007],
        [-0.0144],
        [-0.0303],
        [-0.0115],
        [-0.0089],
        [-0.0129],
        [-0.0128],
        [-0.0011],
        [-0.0178],
        [-0.0031],
        [-0.0119],
        [-0.0035],
        [ 0.0074],
        [-0.0165],
        [-0.0058],
        [-0.0393],
        [ 0.0149],
        [-0.0209],
        [-0.0335],
        [-0.0154],
        [-0.0019],
        [-0.0015],
        [-0.0252],
        [-0.0104],
        [-0.0215],
        [-0.0116],
        [-0.0048],
        [-0.0143],
        [ 0.0003],
        [-0.0025],
        [-0.0292],
        [ 0.0009],
        [-0.0217],
        [-0.0207],
        [-0.0073],
        [-0.0280],
        [-0.0219],
        [-0.0233],
        [-0.0145],
        [-0.0391],
        [-0.0103],
        [-0.0184],
        [ 0.0005],
        [-0.0251],
        [-0.0156],
        [-0.0254],
        [-0.0123],
        [-0.0313],
        [-0.0188],
        [-0.0318],
        [-0.0167],
        [-0.0001],
        [-0.0005],
        [-0.0105],
        [-0.0266],
        [-0.0133],
        [-0.0164],
        [-0.0216],
        [-0.0181],
        [-0.0036],
        [-0.0052],
        [-0.0310],
        [-0.0131],
        [-0.0067],
        [-0.0049],
        [-0.0141],
        [-0.0188],
        [-0.0215],
        [-0.0438],
        [-0.0172],
        [-0.0152],
        [-0.0290],
        [-0.0239],
        [ 0.0038],
        [-0.0191],
        [-0.0283],
        [-0.0015],
        [-0.0101],
        [-0.0054],
        [-0.0108],
        [-0.0198],
        [-0.0089],
        [-0.0106],
        [-0.0277],
        [-0.0057],
        [-0.0043],
        [-0.0065],
        [-0.0308],
        [-0.0225],
        [-0.0183],
        [ 0.0059],
        [-0.0261],
        [-0.0289],
        [-0.0140],
        [-0.0283],
        [-0.0134],
        [-0.0251],
        [-0.0115],
        [-0.0189],
        [-0.0172],
        [-0.0098],
        [-0.0162],
        [-0.0167],
        [ 0.0166],
        [-0.0070],
        [-0.0063],
        [-0.0272],
        [-0.0140],
        [-0.0071],
        [ 0.0053],
        [-0.0182],
        [ 0.0041],
        [-0.0163],
        [-0.0188],
        [-0.0033],
        [-0.0310],
        [-0.0084]], device='cuda:0', grad_fn=<AddmmBackward>)

这里我使用softmax激活函数,得到的是一个全1的矩阵:

out = 
tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]], device='cuda:0', grad_fn=<SoftmaxBackward>)

而使用sigmoid激活函数,得到的数组内容如下:

data = 
tensor([[0.5014],
        [0.4970],
        [0.4970],
        [0.4985],
        [0.4973],
        [0.4952],
        [0.4959],
        [0.4943],
        [0.4994],
        [0.5008],
        [0.4949],
        [0.5002],
        [0.4964],
        [0.4924],
        [0.4971],
        [0.4978],
        [0.4968],
        [0.4968],
        [0.4997],
        [0.4956],
        [0.4992],
        [0.4970],
        [0.4991],
        [0.5018],
        [0.4959],
        [0.4985],
        [0.4902],
        [0.5037],
        [0.4948],
        [0.4916],
        [0.4962],
        [0.4995],
        [0.4996],
        [0.4937],
        [0.4974],
        [0.4946],
        [0.4971],
        [0.4988],
        [0.4964],
        [0.5001],
        [0.4994],
        [0.4927],
        [0.5002],
        [0.4946],
        [0.4948],
        [0.4982],
        [0.4930],
        [0.4945],
        [0.4942],
        [0.4964],
        [0.4902],
        [0.4974],
        [0.4954],
        [0.5001],
        [0.4937],
        [0.4961],
        [0.4937],
        [0.4969],
        [0.4922],
        [0.4953],
        [0.4921],
        [0.4958],
        [0.5000],
        [0.4999],
        [0.4974],
        [0.4934],
        [0.4967],
        [0.4959],
        [0.4946],
        [0.4955],
        [0.4991],
        [0.4987],
        [0.4923],
        [0.4967],
        [0.4983],
        [0.4988],
        [0.4965],
        [0.4953],
        [0.4946],
        [0.4891],
        [0.4957],
        [0.4962],
        [0.4928],
        [0.4940],
        [0.5009],
        [0.4952],
        [0.4929],
        [0.4996],
        [0.4975],
        [0.4987],
        [0.4973],
        [0.4951],
        [0.4978],
        [0.4974],
        [0.4931],
        [0.4986],
        [0.4989],
        [0.4984],
        [0.4923],
        [0.4944],
        [0.4954],
        [0.5015],
        [0.4935],
        [0.4928],
        [0.4965],
        [0.4929],
        [0.4967],
        [0.4937],
        [0.4971],
        [0.4953],
        [0.4957],
        [0.4976],
        [0.4960],
        [0.4958],
        [0.5041],
        [0.4983],
        [0.4984],
        [0.4932],
        [0.4965],
        [0.4982],
        [0.5013],
        [0.4955],
        [0.5010],
        [0.4959],
        [0.4953],
        [0.4992],
        [0.4923],
        [0.4979]])

可以看出用softmax与用sigmoid函数得到的分布大致相同,只不过初始化的权重参数不同

  • 5
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值