在二分类任务中应用最广泛的激活函数是sigmoid,而在多分类任务中便是softmax函数,用于将实值隐射到(0-1)的区间内,通过交叉熵或者说是负对数极大似然估计作为损失函数,通过随机梯度下降、牛顿法、adagrad、adadelta等优化函数进行迭代优化。
sigmoid函数公式如下:
softmax函数公式如下
,其中 x(i) 代表 x 的第
在softmax函数的分子分母中都有exp函数的身影,当exp函数的输入过大时,会导致在有限精度无法表示的情况,即常见的inf符号,在输入数据没有做过归一化的情况下,某一个特征的取值范围跨度很大时,其实极为容易发生上述情况。而变换处理方法其实也很简单,只需要在分子分母的exp项输入时减去某个值即可,最常用的是行方向上的最大值。即:
证明很简单,通过上述方法变换后的值跟原始值是完全一致的,因为:
下面是python代码实现:
import numpy as np
def normal_softmax(x):
return (np.exp(x).T/np.exp(x).sum(axis=1)).T
def transform_softmax(x):
max_of_dim1 =np.max(x,axis=1,keepdims=True)
return (np.exp(x-max_of_dim1).T/np.exp(x-max_of_dim1).sum(axis=1,keepdims=True).T).T
data = np.random.randint(0,5,(10,5))
normal_data = normal_softmax(data)
normal_data
array([[ 0.20738626, 0.07629314, 0.07629314, 0.56373431, 0.07629314],
[ 0.23890664, 0.64941559, 0.08788884, 0.01189446, 0.01189446],
[ 0.35899605, 0.13206727, 0.01787336, 0.35899605, 0.13206727],
[ 0.39169577, 0.01950138, 0.14409682, 0.05301026, 0.39169577],
[ 0.01015357, 0.20393995, 0.20393995, 0.02760027, 0.55436626],
[ 0.09847516, 0.09847516, 0.26768323, 0.26768323, 0.26768323],
[ 0.56373431, 0.07629314, 0.20738626, 0.07629314, 0.07629314],
[ 0.43094948, 0.05832267, 0.05832267, 0.02145571, 0.43094948],
[ 0.52059439, 0.07045479, 0.19151597, 0.19151597, 0.02591887],
[ 0.46437643, 0.17083454, 0.17083454, 0.02311994, 0.17083454]])
transform_data = transform_softmax(data)
transform_data
array([[ 0.20738626, 0.07629314, 0.07629314, 0.56373431, 0.07629314],
[ 0.23890664, 0.64941559, 0.08788884, 0.01189446, 0.01189446],
[ 0.35899605, 0.13206727, 0.01787336, 0.35899605, 0.13206727],
[ 0.39169577, 0.01950138, 0.14409682, 0.05301026, 0.39169577],
[ 0.01015357, 0.20393995, 0.20393995, 0.02760027, 0.55436626],
[ 0.09847516, 0.09847516, 0.26768323, 0.26768323, 0.26768323],
[ 0.56373431, 0.07629314, 0.20738626, 0.07629314, 0.07629314],
[ 0.43094948, 0.05832267, 0.05832267, 0.02145571, 0.43094948],
[ 0.52059439, 0.07045479, 0.19151597, 0.19151597, 0.02591887],
[ 0.46437643, 0.17083454, 0.17083454, 0.02311994, 0.17083454]])
np.allclose(normal_data,transform_data)
True
可以看出,两个结果是完全一致的,在输入值较小的情况下,两个函数都可以进行运算。下面尝试将输入值改为很大的值。
data = np.random.randint(10000000,10000005,(10,5))
再运行下面这行代码会得到如下的警告
normal_data = normal_softmax(data)
lib\site-packages\ipykernel__main__.py:2: RuntimeWarning: overflow encountered in exp
from ipykernel import kernelapp as app
lib\site-packages\ipykernel__main__.py:2: RuntimeWarning: invalid value encountered in true_divide
from ipykernel import kernelapp as app
再查看normal_data 的值
array([[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan],
[ nan, nan, nan, nan, nan]])
全都为nan,而使用变换过的softmax函数则可以进行正常的运算
transform_data = transform_softmax(data)
transform_data
array([[ 0.08015892, 0.08015892, 0.21789455, 0.02948882, 0.59229879],
[ 0.05832267, 0.02145571, 0.43094948, 0.05832267, 0.43094948],
[ 0.09501737, 0.0128592 , 0.09501737, 0.70208868, 0.09501737],
[ 0.00800164, 0.05912455, 0.43687463, 0.05912455, 0.43687463],
[ 0.14884758, 0.14884758, 0.14884758, 0.14884758, 0.40460968],
[ 0.20393995, 0.01015357, 0.55436626, 0.20393995, 0.02760027],
[ 0.44744543, 0.06055515, 0.44744543, 0.022277 , 0.022277 ],
[ 0.03106277, 0.08443737, 0.22952458, 0.6239125 , 0.03106277],
[ 0.01499127, 0.11077134, 0.01499127, 0.0407505 , 0.81849562],
[ 0.03875395, 0.77839397, 0.10534417, 0.03875395, 0.03875395]])
可以看出,变换过的softmax函数可以应用于输入值为很大值的情况,因此在工程实现中有必要进行适当的考虑。