2023年最新关于bias与variance的权衡

文章探讨了在回归分析中,遗漏变量和多余变量对模型的影响,以及偏差和方差之间的平衡。提出了两种选择模型复杂性的方法:一般到特定模型选择和M折交叉验证。前者通过逐步剔除不显著变量来构建模型,后者利用未用于参数估计的数据块来评估模型性能。
摘要由CSDN通过智能技术生成

1.The Bias-VarianceTradeoff

ldeally, a model should include all variables that explain the dependent variable andexclude all that do not. In practice, the regression model is having either

Omittedvariables,

Extraneousincluded variables

1.1Omitted Variable

Anomitted variable is one that has a non-zero coefficient but is not included ina model. Omitting a variable has two effects.

Effectsof Omitted Variable Bias

√The remaining variables absorb the effects of the omitted variable attributableto common variation.

Theregression coefficient can't be interpreted

√Theestimated residuals are larger in magnitude than the true shocks.

This is becausethe residuals contain both the true shock and some part of the omittedvariable.

1.2Extraneous included variables

Anextraneous variable is one that is included in the model but is not needed.

Effectsof including irrelevant variables

√Does not bias coefficients

Inlarge samples , the coefficient on an extraneous variable converges to itspopulation value of zero.

√Increasethe uncertainty of the estimated model paremeters

√IncreasR2 but decrease adjusted R2

2.1Tradeoff between bias and variance

Onone hand, models with more explanatory variables have more

estimationerror and also more explanatory power.

Onthe other hand, models with few explanatory variables have less

estimationerror but also less explanatory power.

2.2Two approaches to find the appropriate model complexity

2.2.1.General-to-specific modelselection

1.First includes all relevant variables.

2.Removethe variable with coefficient with the smallest absolute t-statistic(statistically insignificant ).

3.Re-estimateusing the remaining explanatory variables and removeunqualified variables.

4.Repeat the steps mentioned above until the model contains no

coefficientsthat are statistically insignificant.

5.Common choice for a are between 1% and 0.1% (t value are least 2.57or 3.29 ,respectively).

2.2.2.M-fold cross-validation

M-foldcross-validation lt is designed to select a model that performs well infittingobservations not used to estimated the parameter (out-of-sample prediction).

Stepsof m-fold cross-validation

1.The first step is to determine a set of candidate models . lf a dataset has ncandidate explanatory variables, then there are 2n

possiblemodel specifications.

2.Splittingthe data into m equal sized blocks , parameters are2

estimatedusing m-1 blocks (training set) and residuals are

computedwith data in the excluded block . (validation set)

3.Repeatthe process of estimating parameters and computing residual for a total of mtimes (ensure each block is used to compute residual once).

4.Computesum of squared errors for m times and choose the model

withthe smallest out-of-samble sum of squared residual

总结:上面是理论总结,强调的是如何在偏差与方差之间来选择变量,算法的思想重要的强调的是权衡;最后提供了两种选择变量的思路:General-to-specific model selection和 M-foldcross-validation

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值