谈谈softmax

最新推荐文章于 2023-12-20 12:42:49 发布

yuexiaomao

最新推荐文章于 2023-12-20 12:42:49 发布

阅读量332

点赞数

分类专栏： dnn 基础学习文章标签：神经网络深度学习

原文链接：https://towardsdatascience.com/softmax-activation-function-how-it-actually-works-d292d335bd78

版权

基础学习同时被 2 个专栏收录

27 篇文章 0 订阅

订阅专栏

dnn

7 篇文章 0 订阅

订阅专栏

原文：https://towardsdatascience.com/softmax-activation-function-how-it-actually-works-d292d335bd78

softmax经常用，但是具体性质这些也没有了解过，正好看了个博客，记录一下：

定义：

Softmax是一个激活函数，它将numbers/logits扩展为概率。Softmax的输出是一个向量（比如v），带有每个类别概率。对于所有可能的结果或类别，向量v中的概率总和为1。

栗子

考虑一个CNN模型，该模型旨在将图像分类为狗、猫、马或猎豹（4种可能的结果/类别）。CNN的最后一层（完全连接的）输出一个logits向量L，该向量通过Softmax层，该层将logits转换为概率P。这些概率是4类中每一类的模型预测。

python实现：

from math import exp
def softmax(input_vector):
    # Calculate the exponent of each element in the input vector
    exponents = [exp(j) for j in input_vector]
    # divide the exponent of each value by the sum of the  
    # exponents and round of to 3 decimal places
    p = [round(exp(i)/sum(exponents),3) for i in input_vector]
    return p
print(softmax([3.2,1.3,0.2,0.8]))

性质：

但有人可能会问，为什么不使用标准规范化，也就是说，取每个logit除以所有logit之和，得到概率呢？为什么要拿指数？这里有两个原因。

1、

观察：

# Softmax normalization
softmax([2,4]) = [0.119, 0.881]
softmax([4,8]) = [0.018, 0.982]

# Standard normalization

def std_norm(input_vector):
    p = [round(i/sum(input_vector),3) for i in input_vector]
    return p

std_norm([2,4]) = [0.333, 0.667]
std_norm([4,8]) = [0.333, 0.667]

注意到区别了吗？对于标准规范化，一个向量和一个标量缩放的相同向量产生相同的输出。对于上述情况，将第一个向量[2,4]乘以2得到[4,8]，两个向量产生相同的输出。使用相同的推理，以下对将产生相同的输出：{[8,24]，[2.4,7.199]}表示比例因子为0.3。事实上，任何按因子缩放的向量都会产生与原始向量相同的输出。

2、

当logits中有负值时，会出现另一个问题。在这种情况下，输出中的概率为负。Softmax不受负值影响，因为任何值（正值或负值）的指数始终为正值。

yuexiaomao

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
谈谈softmax

原文：https://towardsdatascience.com/softmax-activation-function-how-it-actually-works-d292d335bd78softmax经常用，但是具体性质这些也没有了解过，正好看了个博客，记录一下：定义：Softmax是一个激活函数，它将numbers/logits扩展为概率。Softmax的输出是一个向量（比如v），带有每个类别概率。对于所有可能的结果或类别，向量v中的概率总和为1。 ...
复制链接

扫一扫