What is overfitting and how can I avoid it?

本文探讨了神经网络训练中常见的过拟合与欠拟合问题,包括其产生的原因、潜在的危害以及如何避免这些问题的方法。文章还介绍了几种避免过拟合和欠拟合的技术,如模型选择、早期停止等。

The critical issue in developing a neural network is generalization: how well will the network make predictions for cases that are not in the training set? NNs, like other flexible nonlinear estimation methods such as kernel regression and smoothing splines, can suffer from either underfitting or overfitting. A network that is not sufficiently complex can fail to detect fully the signal in a complicated data set, leading to underfitting. A network that is too complex may fit the noise, not just the signal, leading to overfitting. Overfitting is especially dangerous because it can easily lead to predictions that are far beyond the range of the training data with many of the common types of NNs. Overfitting can also produce wild predictions in multilayer perceptrons even with noise-free data. For an elementary discussion of overfitting, see Smith (1996). For a more rigorous approach, see the article by Geman, Bienenstock, and Doursat (1992) on the bias/variance trade-off (it's not really a dilemma). We are talking about statistical bias here: the difference between the average value of an estimator and the correct value. Underfitting produces excessive bias in the outputs, whereas overfitting produces excessive variance. There are graphical examples of overfitting and underfitting in Sarle (1995, 1999). The best way to avoid overfitting is to use lots of training data. If you have at least 30 times as many training cases as there are weights in the network, you are unlikely to suffer from much overfitting, although you may get some slight overfitting no matter how large the training set is. For noise-free data, 5 times as many training cases as weights may be sufficient. But you can't arbitrarily reduce the number of weights for fear of underfitting.

Given a fixed amount of training data, there are at least six approaches to avoiding underfitting and overfitting, and hence getting good generalization: o Model selection o Jittering o Early stopping o Weight decay o Bayesian learning o Combining networks

The first five approaches are based on well-understood theory. Methods for combining networks do not have such a sound theoretical basis but are the subject of current research. These six approaches are discussed in more detail under subsequent questions. The complexity of a network is related to both the number of weights and the size of the weights. Model selection is concerned with the number of weights, and hence the number of hidden units and layers. The more weights there are, relative to the number of training cases, the more overfitting amplifies noise in the targets (Moody 1992). The other approaches listed above are concerned, directly or indirectly, with the size of the weights. Reducing the size of the weights reduces the "effective" number of weights--see Moody (1992) regarding weight decay and Weigend (1994) regarding early stopping. Bartlett (1997) obtained learning-theory results in which generalization error is related to the L_1 norm of the weights instead of the VC dimension.

Overfitting is not confined to NNs with hidden units. Overfitting can occur in generalized linear models (networks with no hidden units) if either or both of the following conditions hold:

1. The number of input variables (and hence the number of weights) is large with respect to the number of training cases. Typically you would want at least 10 times as many training cases as input variables, but with noise-free targets, twice as many training cases as input variables would be more than adequate. These requirements are smaller than those stated above for networks with hidden layers, because hidden layers are prone to creating ill-conditioning and other pathologies.

2. The input variables are highly correlated with each other. This condition is called "multicollinearity" in the statistical literature. Multicollinearity can cause the weights to become extremely large because of numerical ill-conditioning--see "How does ill-conditioning affect NN training?" Methods for dealing with these problems in the statistical literature include ridge regression (similar to weight decay), partial least squares (similar to Early stopping), and various methods with even stranger names, such as the lasso and garotte.

 

from:

http://www.faqs.org/faqs/ai-faq/neural-nets/part3/

(Mathcad+Simulink仿真)基于扩展描述函数法的LLC谐振变换器小信号分析设计内容概要:本文围绕“基于扩展描述函数法的LLC谐振变换器小信号分析设计”展开,结合Mathcad与Simulink仿真工具,系统研究LLC谐振变换器的小信号建模方法。重点利用扩展描述函数法(Extended Describing Function Method, EDF)对LLC变换器在非线性工作条件下的动态特性进行线性化近似,建立适用于频域分析的小信号模型,并通过Simulink仿真验证模型准确性。文中详细阐述了建模理论推导过程,包括谐振腔参数计算、开关网络等效处理、工作模态分析及频响特性提取,最后通过仿真对比验证了该方法在稳定性分析与控制器设计中的有效性。; 适合人群:具备电力电子、自动控制理论基础,熟悉Matlab/Simulink和Mathcad工具,从事开关电源、DC-DC变换器或新能源变换系统研究的研究生、科研人员及工程技术人员。; 使用场景及目标:①掌握LLC谐振变换器的小信号建模难点与解决方案;②学习扩展描述函数法在非线性系统线性化中的应用;③实现高频LLC变换器的环路补偿与稳定性设计;④结合Mathcad进行公式推导与参数计算,利用Simulink完成动态仿真验证。; 阅读建议:建议读者结合Mathcad中的数学推导与Simulink仿真模型同步学习,重点关注EDF法的假设条件与适用范围,动手复现建模步骤和频域分析过程,以深入理解LLC变换器的小信号行为及其在实际控制系统设计中的应用。
### 回答1: 过拟合和欠拟合是机器学习中常见的问题。过拟合指模型在训练集上表现很好,但在测试集上表现较差,即模型过于复杂,过度拟合了训练数据,导致泛化能力不足。欠拟合则指模型在训练集和测试集上表现都较差,即模型过于简单,无法捕捉数据的复杂性和规律。为了解决这些问题,需要对模型进行调整和优化,以达到更好的泛化能力和预测准确性。 ### 回答2: 在机器学习中,过拟合(overfitting)和欠拟合(underfitting)是两个非常重要的概念。通俗地来说,过拟合指的是模型过于“敏感”地拟合数据,把噪声和偶然性也当作规律进行训练,导致在测试集上表现不佳;而欠拟合则表示模型过于简单,没能完全拟合训练集,导致在训练集和测试集(甚至未知的数据)上的预测效果都不理想。下面分别从原因、表现和如何解决这两个问题阐述。 一、原因 (1)过拟合 过拟合出现的原因一般是模型对训练数据过于敏感,太过注重细节,把数据中本应不具有泛化能力的噪声学进去了,导致模型在未知数据上效果大打折扣。具体来说,造成过拟合的因素有: · 训练集样本量不够:如果样本数据较少,模型可能抓不到数据的本质规律,从而把一些随机性当作了规律。 · 模型复杂度过高:如果模型过于复杂,将大量无用特征学进去,很可能导致过拟合问题。一些常见的复杂模型,如决策树、支持向量机、神经网络等。 · 迭代次数太多:若模型训练次数太多,就会导致模型过于关注训练数据,而失去对未知数据的泛化能力。 (2)欠拟合 欠拟合问题一般是因为模型不具备足够的学习能力,不能很好地拟合数据,导致预测效果不佳。从技术角度分析,造成欠拟合的原因有如下几个方面: · 训练集数据量不足:与过拟合相反,训练集数据量太少,可能会使模型难以理解数据中的规律,从而没能很好地学习到特征。 · 模型复杂度不够:如果模型比较简单,很可能没能很好地学习到训练集中的关系,导致欠拟合问题。 · 非线性问题过于简单:在处理非线性问题时,如果模型只是采用线性拟合的方法,就难以拟合训练集。 二、表现 (1)过拟合 过拟合的模型通常在训练集上表现突出,但在验证集及测试集上的表现较差,通常表现为: · 训练集误差和验证集误差之间差异明显,可能是训练误差低至0,但验证集误差依旧很高; · 模型表现过于复杂,对于Case的预测准确度很高,但对于未知数据的预测表现不佳; · 模型在训练数据中产生极大波动,对于训练集中微小的变化都作出反应; (2)欠拟合 欠拟合的模型表现比较显然,可能表现如下: · 训练误差和验证误差各自都很高; · 模型表现过于简单,无法从训练集中学到足够的规律; · 对于Case的预测准确度不高,且对于未知数据的预测表现不佳。 三、如何解决 (1)过拟合 在解决过拟合的问题时,有一些常见的方法,如下: · 交叉验证:通过重复采用数据集中的不同子集,来训练和测试模型,使模型变得更可靠,从而减少过拟合的风险。 · 增加数据量:如前文所述,训练数据集不足是导致过拟合的一个重要原因,因此增加数据量的方式是一种有效减少过拟合的手段。 · 简化模型:通过降低模型复杂度,如减少层数,删除某些特征等,避免把噪声当做规律进行训练,从而提高泛化能力。 (2)欠拟合 在解决欠拟合的问题时,通常采取以下方式: · 重新设计特征:特征工程是机器学习中非常重要的一环,通过重构特征,提高模型的表达能力,能够更好地利用数据的潜在规律。 · 增加数据量:在欠拟合情况下,通常是由于数据量不足导致的,增加数据量做法同过拟合的解决方案。 · 使用更加复杂的模型:如果模型过于简单,以至于无法发现数据的更复杂的规律,那么就需要重新考虑模型的构建,使用更加复杂的模型,如深度神经网络等。 ### 回答3: Overfitting(过拟合)是指模型过于复杂,试图完全匹配训练数据集,导致在新的数据上表现不佳。这种情况下,模型能够记住训练数据集中的每个细节,包括误差和噪声,从而无法推广到新数据。过拟合通常发生在模型太复杂或参数太多的情况下。 在机器学习中,我们使用各种技术来减少过拟合,例如交叉验证、正则化和减少特征数量。这些技术都有助于建立更好的模型并使其更能够适应新数据。 Underfitting(欠拟合)是指模型过于简单,无法很好地拟合训练数据集或新数据。这种情况下,模型的表现能力有限,无法捕获数据中的复杂关系。欠拟合通常发生在模型太简单或参数太少的情况下。 为了解决欠拟合问题,我们可以尝试增加模型的复杂度,增加特征数量或添加更多的隐层。这可以帮助模型更好地捕获数据中的复杂关系。但是,需要注意的是,过度增加模型的复杂度可能会导致过拟合。 因此,我们需要找到一个平衡点,使模型能够在训练数据集和新数据上都表现良好。这需要我们在训练过程中细心观察模型的表现,并使用适当的技术来解决过拟合或欠拟合问题。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值