Deep Learning：深度前馈神经网络（二）

最新推荐文章于 2023-07-16 23:09:21 发布

蚊子爱牛牛

最新推荐文章于 2023-07-16 23:09:21 发布

阅读量878

点赞数

分类专栏： deep-learning 文章标签：深度学习神经网络前馈神经网络输出节点的设计

本文链接：https://blog.csdn.net/XJY104165/article/details/78277923

版权

本文详细介绍了深度前馈神经网络中输出层的设计，包括线性单元用于高斯输出分布、Sigmoid单元用于伯努利输出分布和Softmax单元用于多项式输出分布。每个单元类型的选择与相应的成本函数紧密相关，并且对于不同的任务有不同的优势。例如，Sigmoid单元适合二元变量预测，而Softmax单元则适用于多类别分类。

摘要由CSDN通过智能技术生成

Output Units

The choice of cost function is tightly coupled with the choice of output unit. Most of the time, we simply use the cross-entropy between the data distribution and the model distribution. The choice of how to represent the output then determines the form of the cross-entropy function.
We suppose that the feedforward network provides a set of hidden features defined by $h = f(x; θ)$ . The role of the output layer is then to provide some additional transformation from the features to complete the task that the network must perform.

Linear Units for Gaussian Output Distributions

One simple kind of output unit is an output unit based on an affine transformation with no nonlinearity. These are often just called linear units.
Given features h, a layer of linear output units produces a vector $\hat{y}=W^Th+b$ . Linear output layers are often used to produce the mean of a conditional Gaussian distribution:

p (y | x) = N (y; y^, I)

$p(y|x)=\mathcal{N}(y;\hat{y},I)$
Maximizing the log-likelihood is then equivalent to minimizing the mean squared error.
The maximum likelihood framework makes it straightforward to learn the covariance of the Gaussian too, or to make the covariance of the Gaussian be a function of the input. However, the covariance must be constrained to be a positive definite matrix for all inputs. It is difficult to satisfy such constraints with a linear output layer, so typically other output units are used to parametrize the covariance.
Because linear units do not saturate, they pose little difficulty for gradient-based optimization algorithms and may be used with a wide variety of optimization algorithms.

Sigmoid Units for Bernoulli Output Distributions

Many tasks require predicting the value of a binary variable y .
The maximum-likelihood approach is to define a Bernoulli distribution over y conditioned on x. A Bernoulli distribution is defined by just a single number. The neural net needs to predict only $P(y = 1 | x)$ . For this number to be a valid probability, it must lie in the interval [0, 1].
It is better to use a different approach that ensures there is always a strong gradient whenever the model has the wrong answer. This approach is based on using sigmoid output units combined with maximum likelihood.
A sigmoid output unit is defined by