通用逼近定理_用代码理解通用逼近定理

本文通过代码深入解析通用逼近定理,探讨其在数据科学中的应用。
摘要由CSDN通过智能技术生成

通用逼近定理

The Universal approximation theorem claims that the standard multi-layer feedforward networks with a single hidden layer that contains a finite number of hidden neurons are able to approximate continuous functions with the usage of arbitrary activation functions. (source)

通用逼近定理要求,具有包含有限数量的隐藏神经元的单个隐藏层的标准多层前馈网络能够使用任意激活函数来近似连续函数。 ( 来源)

However, the ability of the neural network to approximate any continuous functions mapping the input to the output goal is constraint by the number of neurons, hidden layers and many techniques utilised during the training process of the network. Intuitively, you can think of this as to whether are there possibly enough computational units and operations set up to approximate a continuous function that can properly map the input to the output. The ability to approximate is also highly dependent on the efficiency of the optimization routine and loss function that we use.

但是,神经网络逼近将输入映射到输出目标的任何连续函数的能力受到神经元数量,隐藏层和网络训练过程中使用的许多技术的限制。 直观地,您可以考虑是否可能设置了足够的计算单元和运算来近似可以将输入正确映射到输出的连续函数。 近似能力还高度取决于优化例程的效率和我们使用的损失函数。

Suggestion: Download the script, run it yourself, and play around the parameters. The repo is (here). If you have forgotten about neural networks, read about it (here).

建议:下载脚本,自己运行,然后播放参数。 回购是( 这里 )。 如果您忘记了神经网络,请阅读它( 此处 )。

These parameters determining the setup and training of the neural network is commonly known as hyperparameters.

这些确定神经网络的设置和训练的参数通常称为超参数。

Example of hyperparameters we can tune in the code:

我们可以在代码中调优超参数的示例:

  1. Network structure. (Number of hidden layers, number of neurons)

    网络结构。 (隐藏层数,神经元数)

model = nn.Sequential(
nn.Linear(1, n_neurons),
nn.ReLU(),
#nn.Linear(n_neurons,n_neurons),
#nn.ReLU(),
nn.Linear(n_neurons,1),
nn.ReLU()
)

2. Number of epochs (number of time we go through all the data), line 57

2.时期数(我们遍历所有数据的时间),第57行

3. Loss function and optimizer, there are so many optimizers available, check it out [here]:

3.损失函数和优化器,有很多可用的优化器,请在[ 此处 ]进行检查:


optimizer = optim.RMSprop(model.parameters(), lr=learning_rate) # define optimizer
#optimizer = optim.SGD(model.parameters(), lr=learning_rate)criterion = nn.MSELoss() # define loss function

实验代码 (Experiment in code)

We can run some experiments in code to better understand the concept of approximation. Given that the function we are trying to approximate has the relationship of y = x², we can run some experiments to gauge how many neurons for a single hidden layer is necessary to fit the y=x² curve and tune hyperparameters in search for the best results.

我们可以在代码中进行一些实验,以更好地理解近似的概念。 假设我们要逼近的函数具有y =x²的关系,我们可以进行一些实验来确定单个隐藏层需要多少神经元才能拟合y =x²曲线并调整超参数以寻找最佳结果。

From the figure above (Fig 1), we can see the with 20 neurons in a single hidden layer, the neural network is able to approximate the function pretty well just by training on the output values. Increasing to 50 neurons in the single hidden layer provided us with better results.

从上图(图1)中,我们可以看到在单个隐藏层中有20个神经元,神经网络仅通过训练输出值就能很好地近似函数。 在单个隐藏层中增加到50个神经元为我们提供了更好的结果。

Recap, a simple illustration for a single hidden layer feedforward network architecture with 8 neurons in case you forget:

回顾一下,一个带有8个神经元的单个隐藏层前馈网络体系结构的简单图示,以防万一您忘记了:

Image for post
Fig 2: Architecture of a single hidden layer Feedforward Neural Network
图2:单个隐藏层前馈神经网络的体系结构

In theory, the universal approximation theorems imply that neural networks can approximate a wide variety of functions very well when given an appropriate combination value. However, learning to construct the network with the appropriate values is not always possible due to the constraint/challenges when training the network in the search of such values.

从理论上讲,通用逼近定理意味着,如果给定适当的组合值,则神经网络可以很好地逼近各种函数。 但是,由于在训练网络搜索此类值时存在限制/挑战,因此始终无法学习用适当的值构建网络。

Image for post
Fig 3: Feedforward Neural Network with a single hidden layer. Poor tuning of hyperparameters leading to bad training.
图3:具有单个隐藏层的前馈神经网络。 超参数调优不佳,导致培训不善。

From the figure above (Fig 3), the same architecture of a single hidden layer, the network approximates very poorly. This is due to the fact that training the neural network does not always provide us with precise/perfect values. Therefore, we have to be aware that even though theoretically the neural network could approximate a very accurate continuous function mapping, it may fail to approximate close to the expected continuous function as the training process of the neural network comes with its own challenges.

从上图(图3),即单个隐藏层的相同体系结构来看,网络的近似性非常差。 这是由于以下事实:训练神经网络并不总是为我们提供精确/完美的值。 因此,我们必须意识到,尽管理论上神经网络可以逼近非常准确的连续函数映射,但是由于神经网络的训练过程面临着自身的挑战,因此它可能无法逼近预期的连续函数。

Image for post
Fig 4: Feedforward Neural Network with two hidden layers. Poor tuning of hyperparameters leading to bad training but results are still pretty good.
图4:具有两个隐藏层的前馈神经网络。 超参数的调整不当会导致训练失败,但效果仍然不错。

Running another experiment, we connected another hidden layer with 20 neurons and 50 neurons, the results can be seen in the figure above (Fig 4) . It can be observed that the approximation of the predicted function is much better without spending much time tuning the training parameters which is of expectation. Increasing the neurons and connections present in search for better approximation is a pretty good heuristic but we have to remember that in the process of training the neurons lies a few challenges too that may deter the neural network from learning the best values needed to approximate the function even if more than enough nodes are available in theory.

运行另一个实验,我们将另一个隐藏层与20个神经元和50个神经元相连,结果可以在上图中看到(图4)。 可以观察到,在不花费大量时间调整期望的训练参数的情况下,预测函数的逼近性要好得多。 在寻找更好的近似值时增加神经元和连接是一种很好的启发式方法,但是我们必须记住,在训练神经元的过程中也存在一些挑战,这可能会阻止神经网络学习近似函数所需的最佳值。即使理论上有足够多的可用节点。

Image for post
Fig 5: Feedforward Neural Network with a single hidden layer. Excellent tuning of hyperparameters leading to good training.
图5:具有单个隐藏层的前馈神经网络。 超参数的出色调整可带来良好的培训。

Another important takeaway from the experiment is that by spending more time tuning the hyperparameters of the neural network, we can actually get a near-perfect approximation with the same architecture of 1 hidden layer with 50 neurons as shown in the figure above (Fig 5). It can be observed that the results are even better than using 2 hidden layers with bad hyperparameters. The experiment with 2 hidden layers can definitely approximate better if we spend more time tuning the hyperparameters. This shows how important is the optimization routine and certain hyperparameters are to training the network. With 2 layers and more neurons, it does not take much tuning to get a good result because there are more connections and nodes to use. However, as we add more nodes and layers, it gets more computationally expensive.

该实验的另一个重要收获是,通过花费更多的时间调整神经网络的超参数,我们实际上可以使用上图所示的具有50个神经元的1个隐藏层的相同结构,获得接近完美的近似值(图5)。 。 可以观察到,结果甚至比使用2个具有超参数错误的隐藏层更好。 如果我们花更多的时间调整超参数,则具有2个隐藏层的实验绝对可以更好地进行逼近。 这表明优化例程和某些超参数对于训练网络有多重要。 由于具有2层和更多的神经元,因此不需要太多的调整即可获得良好的结果,因为有更多的连接和节点可供使用。 但是,随着我们添加更多的节点和层,它的计算量也越来越大。

Lastly, if the relationship is too complex, 1 hidden layer with 50 neurons may not even theoretically be able to approximate the input to output mapping well enough in the first place. y=x² is a relatively easy relationship to approximate but we can think of the relationship of inputs such as images. The relationship of image pixel values to the classification of the image is ridiculously complex where not even the best mathematician can possibly come out with an appropriate function. However, we can use neural networks to approximate such complex relationships by adding more hidden layers and neurons. This gave birth to the field of Deep Learning which is a subset of Machine Learning focusing on utilising neural networks with a lot of layers (ex: deep neural networks, deep convolutional networks) to learn very complex function mappings.

最后,如果关系太复杂,则从理论上说,具有50个神经元的1个隐藏层甚至可能在理论上甚至无法近似输入到输出。 y =x²是一个相对容易近似的关系,但我们可以想到输入(例如图像)的关系。 图像像素值与图像分类之间的关系非常复杂,即使最好的数学家也可能无法使用适当的函数得出。 但是,我们可以使用神经网络通过添加更多隐藏层和神经元来近似这种复杂的关系。 这催生了深度学习领域,它是机器学习的一个子集,专注于利用具有很多层的神经网络(例如:深度神经网络,深度卷积网络)来学习非常复杂的函数映射。

建议 (Recommendation)

Please do try other functions such as sin(x) or cos(x) and see whether can you approximate the relationship well. You may keep failing till you get the hyperparameters right but it will give you a good insight into tuning hyperparameters. I suggest if the function is too hard to approximate, go ahead and add more layers and neurons! Try out different optimizers such as SGD and ADAM, compare the results.

请尝试使用其他函数(例如sin(x)cos(x) ,看看是否可以很好地近似该关系。 在正确设置超参数之前,您可能会一直失败,但这将使您深入了解如何调整超参数。 我建议如果函数太难估计,请继续添加更多的层和神经元! 试用不同的优化器,例如SGD和ADAM,比较结果。

翻译自: https://towardsdatascience.com/understand-universal-approximation-theorem-with-code-774dcef55731

通用逼近定理

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值