Deep Learning:深度前馈神经网络(二)

本文详细介绍了深度前馈神经网络中输出层的设计,包括线性单元用于高斯输出分布、Sigmoid单元用于伯努利输出分布和Softmax单元用于多项式输出分布。每个单元类型的选择与相应的成本函数紧密相关,并且对于不同的任务有不同的优势。例如,Sigmoid单元适合二元变量预测,而Softmax单元则适用于多类别分类。
摘要由CSDN通过智能技术生成

Output Units

The choice of cost function is tightly coupled with the choice of output unit. Most of the time, we simply use the cross-entropy between the data distribution and the model distribution. The choice of how to represent the output then determines the form of the cross-entropy function.
We suppose that the feedforward network provides a set of hidden features defined by h=f(x;θ) . The role of the output layer is then to provide some additional transformation from the features to complete the task that the network must perform.

Linear Units for Gaussian Output Distributions

One simple kind of output unit is an output unit based on an affine transformation with no nonlinearity. These are often just called linear units.
Given features h, a layer of linear output units produces a vector y^=WTh+b . Linear output layers are often used to produce the mean of a conditional Gaussian distribution:

p(y|x)=N(y;y^,I)

Maximizing the log-likelihood is then equivalent to minimizing the mean squared error.
The maximum likelihood framework makes it straightforward to learn the covariance of the Gaussian too, or to make the covariance of the Gaussian be a function of the input. However, the covariance must be constrained to be a positive definite matrix for all inputs. It is difficult to satisfy such constraints with a linear output layer, so typically other output units are used to parametrize the covariance.
Because linear units do not saturate, they pose little difficulty for gradient-based optimization algorithms and may be used with a wide variety of optimization algorithms.

Sigmoid Units for Bernoulli Output Distributions

Many tasks require predicting the value of a binary variable y .
The maximum-likelihood approach is to define a Bernoulli distribution over y conditioned on x. A Bernoulli distribution is defined by just a single number. The neural net needs to predict only P(y=1|x) . For this number to be a valid probability, it must lie in the interval [0, 1].
It is better to use a different approach that ensures there is always a strong gradient whenever the model has the wrong answer. This approach is based on using sigmoid output units combined with maximum likelihood.
A sigmoid output unit is defined by

y
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值