Feature selection, regularization -- L5 for Data Science

本文讨论了在数据科学中,如何通过特征选择和正则化来降低模型的偏差和方差。通过举例说明高阶多项式可能导致高方差,而简单的线性模型可能引入高偏差。介绍了交叉验证的重要性,确保训练和验证集分布一致。此外,解释了正则化的原理,当惩罚项极大时,模型将倾向于忽略大部分特征,减少过拟合。最后提到了特征标准化在正则化前的重要性,以确保不同尺度的特征对模型有公平的影响。
摘要由CSDN通过智能技术生成

*The observations are identically distributed means that we are just making random observations without any bias. 

* When we have different set of observations, we will get different value of point estimates of coefficients.

For each case, the red star is the true value of beta_0 and beta_1. Every obsevation we make will end up with one green dot. 

If we already know the values of betas for the true model, we can then quantify what is the bias and variance of the model we have with respect to true value. This helps us to know what kind of algorithm will give us a high bias or high variance which we don't want.

For example, if we're going to fit the observation with a 10th order polynomial, our model will change drastically. We could imagine with different set of observations, our coefficients will vary so fast with high order polynomial. So we would see the algorithm with high order polynomial fitting is really sensitive to the data we have, then it will give us a high variance.

And for a contrary case, if we just fit our model with a horizontal line. No matter what observation we fit in, it will always predicts a constant, so we will have no variance but an obvious bias, the value of bias will just depends on the value of the constant this line represents.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值