多层神经网络Multi-layer networks

Multi-layer networks


A nonlinear problem

Consider again the  best linear fit  we found for the car data. Notice that the data points are not evenly distributed around the line: for low weights, we see more miles per gallon than our model predicts. In fact, it looks as if a simple  curve  might fit these data better than the straight line. We can enable our neural network to do such curve fitting by giving it an additional node which has a suitably curved ( nonlinear activation function . A useful function for this purpose is the S-shaped  hyperbolic tangent  (tanh) function (Fig. 1).
 tanh function  network with hidden unit
 (Fig. 1)  (Fig. 2)

FIg. 2 shows our new network: an extra node (unit 2) with tanh activation function has been inserted between input and output. Since such a node is "hidden" inside the network, it is commonly called a hidden unit. Note that the hidden unit also has a weight from the bias unit. In general, all non-input neural network units have such a bias weight. For simplicity, the bias unit and weights are usually omitted from neural network diagrams - unless it's explicitly stated otherwise, you should always assume that they are there.

some more data+line   (Fig. 3)

When this network is trained by gradient descent on the car data, it learns to fit the tanh function to the data (Fig. 3). Each of the four weights in the network plays a particular role in this process: the two bias weights shift the tanh function in the x- and y-direction, respectively, while the other two weights scale it along those two directions. Fig. 2 gives the weight values that produced the solution shown in Fig. 3.


Hidden Layers

One can argue that in the example above we have cheated by picking a hidden unit activation function that could fit the data well. What would we do if the data looks like this (Fig. 4)?

very nonlinear data   (Fig. 4)

(Relative concentration of NO and NO2 in exhaust fumes as a function 
of the richness of the ethanol/air mixture burned in a car engine.)

Obviously the tanh function can't fit this data at all. We could cook up a special activation function for each data set we encounter, but that would defeat our purpose of learning to model the data. We would like to have a general, non-linear function approximation method which would allow us to fit any given data set, no matter how it looks like.

multi-layer network   (Fig. 5)

Fortunately there is a very simple solution: add more hidden units! In fact, a network with just two hidden units using the tanh function (Fig. 5) can fit the dat in Fig. 4 quite well - can you see how? The fit can be further improved by adding yet more units to the hidden layer. Note, however, that having too large a hidden layer - or too many hidden layers - can degrade the network's performance (more on this later). In general, one shouldn't use more hidden units than necessary to solve a given problem. (One way to ensure this is to start training with a very small network. If gradient descent fails to find a satisfactory solution, grow the network by adding a hidden unit, and repeat.)

Theoretical results indicate that given enough hidden units, a network like the one in Fig. 5 can approximate any reasonable function to any required degree of accuracy. In other words, any function can be expressed as a linear combination of tanh functions: tanh is a universal basis function. Many functions form a universal basis; the two classes of activation functions commonly used in neural networks are the sigmoidal (S-shaped) basis functions (to which tanh belongs), and the radialbasis functions.

from: http://www.willamette.edu/~gorr/classes/cs449/multilayer.html

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值