深度学习机器学习:softmax和log_softmax区分

最新推荐文章于 2024-07-31 16:48:31 发布

老潘的博客

最新推荐文章于 2024-07-31 16:48:31 发布

阅读量5.2k

点赞数

分类专栏：深度学习机器学习文章标签：函数深度学习机器学习

本文链接：https://blog.csdn.net/IAMoldpan/article/details/78716280

版权

深度学习同时被 2 个专栏收录

64 篇文章 10 订阅

订阅专栏

机器学习

36 篇文章 0 订阅

订阅专栏

softmax 函数

又称为 normalized exponential function:is a generalization of the logistic function that “squashes” a K-dimensional vector $\mathbf {z}$ of arbitrary real values to a K-dimensional vector $\sigma (\mathbf {z} )$ of real values in the range [0, 1] that add up to 1. The function is given by

$\sigma (\mathbf {z} )_{j}={\frac {e^{z_{j}}}{\sum _{k=1}^{K}e^{z_{k}}}} for j = 1, …, K.$

很显然，这个式子将一个n维的张量输入转化为n维的数，其中每个数的范围为0-1，所有数加起来为1。可以理解为为一种概率分布（probability distribution），比如一个多 label 的分类任务（比如手写字符识别0-9），其结果对应着分类结果为j的概率。

In probability theory, the output of the softmax function can be used to represent a categorical distribution – that is, a probability distribution over K different possible outcomes. In fact, it is the gradient-log-normalizer of the categorical probability distribution.[further explanation needed]

The softmax function is used in various multiclass classification methods, such as multinomial logistic regression (also known as softmax regression)[1]:206–209 [1], multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks.[2] Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of K distinct linear functions, and the predicted probability for the j’th class given a sample vector x and a weighting vector w[further explanation needed] is:

下面这个函数是通过向量版的softmax，与之前不同的是这里的x、w是特定维数的向量，输入的向量都在k维的空间中。
$P(y=j\mid \mathbf {x} )={\frac {e^{\mathbf {x} ^{\mathsf {T}}\mathbf {w} _{j}}}{\sum _{k=1}^{K}e^{\mathbf {x} ^{\mathsf {T}}\mathbf {w} _{k}}}}}$