Training set size for neural networks considering curse of dimensionality

最新推荐文章于 2021-01-06 17:00:00 发布

处女座程序员的朋友

最新推荐文章于 2021-01-06 17:00:00 发布

阅读量132

点赞数

本文链接：https://blog.csdn.net/sinat_22510827/article/details/110689319

版权

I'm learning the ropes of neural networks. Recently, I read stuff about the curse of dimensionality and how it might lead to overfitting (e.g. here).

If I understand correctly, the number of features (dimensions) d of a given dataset with n data points is very important when considering the size t of the training set.

QUESTIONS

(...not sure if all my questions are really connected to the curse of dimensionality)

How do I choose the correct training size t considering d and n? Is t a function of d and n?
Do I have to consider d for regularization?

One of the rules of thumb is to have at least 10x more data points as the number of dimensions. Using some intelligent prior information (e.g. good kernel in SVMs), you might even learn a good machine with less data points as dimensions.

The lecture about VC dimension from Yaser Abu-Mostafa [link] motivates this 10x rule with some nice charts. If you are not familiar with VC dimension concept, it is about the capacity of learning. The higher the dimension, the more complex problem we are trying to solve. For example, classical Perceptron has d+1 VC dimensions. Some problems have infinite VC dimensions, such problems are impossible to learn.

A neural net is a linear model in derived variables. Take the regression case, because it is a little bit simpler:

where XX is your data (i.e.: your features), ΓΓ are matrices of weights, γγ are "biases", and ββ are your weights connecting the topmost hidden layer to the output. You see that it is nothing more than a linear model, but in nonlinear functions of XX .

Just like in a linear model, you can overfit when you have too many parameters. A typical strategy for avoiding overfit is regularization. Rather than solving

in ridge regression, for example. Selecting λλ by cross-validation, you're effectively letting the data tell you how much to use your many dimensions.

This generalized directly to neural nets, except that there is no closed form solution to the minimization problem, as there is in ridge regression. You'll overfit if you do

where θθ is a concatenated vector of all of your weights.

Note that the quadratic penalty here isn't the only form of regularization. You could also do L1 regularization, or dropout regularization.

But the idea is the same: build a model that will overfit the data, and then find a regularization parameter (by some variant of cross validation) that will constrain the variability such that you don't overfit.

处女座程序员的朋友

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Training set size for neural networks considering curse of dimensionality

I'm learning the ropes of neural networks. Recently, I read stuff about the curse of dimensionality and how it might lead to overfitting (e.g. here).If I understand correctly, the number of features (dimensions) d of a given dataset with n data points is
复制链接

扫一扫