DS Wannabe Prep学习笔记: Machine Learning Algo 1-CSDN博客

本文链接：https://blog.csdn.net/wendyponcho/article/details/135759126

本文介绍了模型欠拟合和过拟合的概念，强调了正则化技术在防止过拟合中的作用，包括L1、L2正则化以及ElasticNet的区别。还讨论了数据增强、过采样和欠采样等处理不平衡数据的方法，以及监督学习、无监督学习和强化学习的不同应用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

先复习一下基础

Defining Model Underfitting and Overfitting

Type

Definition

How to reduce

Underfitting

the model isn’t able to capture the relationship between the dataset’s independent variables (e.g., weight, height, etc.) and the dependent variables (e.g., price).

1. adding more variables or model features to help the model learn more patterns from the trainning data and reduce underfitting

2. to increase the no. of iterations the model trains for b4 training is stopped

Overfitting

when a model fits the training data too closely and very specifically finding patterns that happen to be in the traning set but not elsewhere.

REGULARIZATION

Regularization

Regularization in machine learning is a technique used to prevent a model from overfitting. Overfitting occurs when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the model becomes too complex, capturing patterns that may not be present in the test data or in new data it encounters after deployment.

Here are the key points about regularization:

Purpose: Regularization techniques are used to simplify models without substantially decreasing their accuracy. They do this by adding some form of penalty or constraint to the model optimization process.
Types of Regularization:
- L1 Regularization (Lasso): Adds a penalty equivalent to the absolute value of the magnitude of coefficients. This can lead to some coefficients being zero, which is useful for feature selection.
- L2 Regularization (Ridge): Adds a penalty equivalent to the square of the magnitude of coefficients. This doesn't reduce coefficients to zero but makes them smaller, leading to a less complex model.
- Elastic Net: Combines L1 and L2 regularization and can be used to balance between feature selection (L1) and feature shrinkage (L2).
Effect on Model Complexity: Regularization typically leads to a decrease in model complexity, which can reduce overfitting. This is done by penalizing the weights of the model, thereby discouraging overly complex models that fit the noise in the training data.
Choosing the Regularization Term: The strength of the regularization is controlled by a hyperparameter, often denoted as lambda (λ) or alpha. The higher the value of this hyperparameter, the stronger the regularization effect. Selecting the right value is critical and is usually done using cross-validation.
Bias-Variance Tradeoff: Regularization is a key technique in managing the bias-variance tradeoff in machine learning. By adding regularization, we increase the bias but decrease the variance, hopefully leading to a better overall model performance on unseen data.
Application in Different Algorithms: While regularization is most commonly talked about in the context of linear models (like linear regression and logistic regression), it's also applicable to other algorithms, including neural networks, where techniques like dropout and weight decay are forms of regularization.