对于softmax函数,我把它单独拿出来使用时意外出现了许多bug,所以把在网上找到的几种编程代码都测试一下,对比看看。
一维情况
对于一维的向量很简单,可以直接使用如下编程代码:
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
二维情况
考虑到二维数组的情况,softmax函数的编写有以下三种:(前两种编程代码参考链接https://www.cnblogs.com/programmerwang/p/14772552.html)
第一种
def softmax2_1(x):
print("初始:\n",x)
print("x的维度",x.ndim)
if x.ndim == 2:
x = x.T
print("转置后的x:\n",x)
x = x - np.max(x, axis=0)
print("进行了防止溢出操作后的:\n",x)
y = np.exp(x) / np.sum(np.exp(x), axis=0)
print("最后结果:\n",y.T)
return y.T
x = x - np.max(x) # 溢出对策
print("一维情况下进行防溢出操作后的:\n",x)
print("最后的结果:",np.exp(x) / np.sum(np.exp(x)))
return np.exp(x) / np.sum(np.exp(x))
x1 = np.array([[1.,3.3,2.5],[2.1, 3.2 , 5.3]])
x2 = np.array([1.0, 2.2, 3.3])
x3 = np.array([[1.0, 2.2, 3.3]])
x4 = np.array([[1.0], [2.2], [3.3]])
实际情况不可能有这种输入,因为这种不属于多分类任务,此处只是测试代码
第二种
def softmax2_2(x):
print("初始:\n", x)
print("x的维度", x.ndim)
temp = np.max(x,axis = 1)
#print(temp.shape)
temp = temp.reshape(temp.size,1)
x = x - temp
print("进行防溢出操作后的x:\n",x)
temp2 = np.sum(np.exp(x),axis = 1)
print(temp2)
temp2 = temp2.reshape(temp2.size, 1)
y = np.exp(x)/temp2
print("最后结果:\n",y)
return y
x1 = np.array([[1.,3.3,2.5],[2.1, 3.2 , 5.3]])
x2 = np.array([1.0, 2.2, 3.3])
#运行错误:numpy.AxisError: axis 1 is out of bounds for array of dimension 1
x3 = np.array([[1.0, 2.2, 3.3]])
x4 = np.array([[1.0], [2.2], [3.3]])
第三种
第三种是错误的代码:
def softmax2_3(x):
print("初始:\n", x)
print("x的维度", x.ndim)
max = np.max(x, axis = 1)
print("获取每行的最大值:",max,"该矩阵的维度",max.shape)
y = x - max
return np.exp(y)/np.sum(np.exp(y), axis=1)
x1 = np.array([[1.,3.3,2.5],[2.1, 3.2 , 5.3]])
运行错误:ValueError: operands could not be broadcast together with shapes (2,3) (2,)
一般来说,输入x为n行m列矩阵,取每列最大值所得矩阵为(n,)二者无法进行矩阵的运算。
注意:在Numpy中,矩阵与向量相加减时,首先要求即要求矩阵的列数与向量的维数相等。然后就是 矩阵的每一行与向量相加减,得出结果。如下:
>>> x = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
>>> print(x) #(4,3)
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
>>> y = np.array([1,2,3])
>>> print(x-y)
[[0 0 0]
[3 3 3]
[6 6 6]
[9 9 9]]
>>> z = np.array([1,2,3,4])
>>> print(x-z)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,3) (4,)