《machine learning yearning》（Done）

最新推荐文章于 2025-06-05 09:15:05 发布

moonfansLTH

最新推荐文章于 2025-06-05 09:15:05 发布

阅读量452

点赞数

CC 4.0 BY-SA版权

分类专栏：学习笔记文章标签：深度学习

本文链接：https://blog.csdn.net/moonfansLTH/article/details/80831432

学习笔记专栏收录该内容

27 篇文章

订阅专栏

本书总结了NG在工业界应用机器学习模型的经验，重点介绍了如何通过错误案例分析、学习曲线等手段诊断并解决高偏差和高方差问题，并讨论了在训练集与测试集分布不一致情况下的应对策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

学习目的

这本书是B老师上周末让我读的书，是NG对机器学习实践中遇到各种问题处理方法的总结

学习周期

6.25 C1-C12
6.26 C1-C12
6.27 C13-C17
6.28 C18-C27
6.29 C28-C32
7.2 C33-C35
7.3 C36
7.4 C37-C38
7.7 C39-C43
7.10 C44-C46

学习心得

这本书比较薄，仅89pages，主要是NG在工业应用中模型调优的一些经验积累。
- 其中error case的分析角度比较新颖，同时结合learning curve直观地把模型误差拆成了 bias、variance，以及当模型出现这些问题时原因及其possible solution。
- 另一个比较有趣的观点是当training set 和 test set不一致时该怎么办，记住一句话：Choose dev and test sets to reflect data you expect to get in the future and want to do well on.

学习笔记

C1-C12 参考：https://zhuanlan.zhihu.com/p/24259676

Basic Error Analysis

C13 Build your first system quickly, then iterate
C14 Error analysis: Look at dev set examples to evaluate ideas
C15 Evaluating multiple ideas in parallel during error analysis
- 通过分析误分类的case，对mislabel的案例进行分类统计（思考是什么原因导致误分类的），可以大致评判出模型对各种误分类场景的受影响程度，进而评估对不同模型优化方向的探索权重。
C16 Cleaning up mislabeled dev and test set examples
- label quality
- 验证集人工打标错误：When I say “mislabeled” here, I mean that the pictures were already mislabeled by a human labeler even before the algorithm encountered it.
- 若误打标的样本数在误分类的占比较低可忽略，较高则worthwhile
- 验证集和测试集需要做同样的处理：Whatever process you apply to fixing dev set labels, remember to apply it to the test set labels too so that your dev and test sets continue to be drawn from the same distribution.
C17 If you have a large dev set, split it into two subsets, only one of which you look at
- 把10%的数据集拿出来用目测的方法看模型误分类的好坏，度量error rates，然后基于小数据集优化模型，迭代到大模型上。如果在小模型上拟合得好而大模型上拟合的不好，则说明小数据集的模型过拟合，换一个小数据集。
C18 How big should the Eyeball and Blackbox dev sets be?
C19 Takeaways: Basic error analysis
- If your dev set is not big enough to split this way, just use an Eyeball dev set for manual error analysis, model selection, and hyperparameter tuning.

Bias and Variance

C20 Bias and Variance: The two big sources of error
- bias 表示模型在训练集上的误差
- variance 表示模型在某测试集上误差的溢出部分
- 不同的优化方法优化的对象不同 bias or variance
C21 Examples of Bias and Variance
- High variance：training error（bias）很小但dev error（variance+bias）很大，模型 overfitting
- High bias：training error（bias）和dev error很接近且较大，模型 underfitting
- High bias and high variance：同时欠拟合且过拟合？
- 优秀的模型应该是 low bias and low variance
C22 Comparing to the optimal error rate
- optimal error rate（Bayes error rate）：由于数据误差和噪声导致模型最优的拟合度/误差率，是最优错误率。
- Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias
- Total dev set error = Optimal error rate (“unavoidable bias”) + Avoidable bias + Variance
- 如何估计optimal error rate：肉眼看
- There are very different techniques that you should apply depending on whether your project’s current problem is high (avoidable) bias or high variance.
C23 Addressing Bias and Variance
- So if you are using neural networks, the academic literature can be a great source of inspiration.
- But the results of trying new architectures are less predictable than the simple formula of increasing the model size and adding data.
- 增加模型复杂度（adding layers/neurons）可以有效地减小 bias，但同时会导致overfitting的风险。此时需要一个合适的正则化方法（regularization method），则可以放心地增加model size。
C24 Bias vs. Variance tradeoff
- 增加模型复杂度会减少bias，提升variance；增加正则化约束会增加bias，降低variance。
C25 Techniques for reducing avoidable bias（降低可避免误差的方法）
- Increase the model size
- Modify input features based on insights from error analysis
- Reduce or eliminate regularization
  - L1/L2 regularization：降低神经元的权重
    - https://www.cnblogs.com/xueyinzhe/p/7253315.html
  - dropout regularizatoin：调整神经网络的结构复杂度
    - https://blog.csdn.net/bixiwen_liu/article/details/52999562
- Modify model architecture
- 不管用的是：Add more training data，只能降低variance
C26 Error analysis on the training set
- Your algorithm must perform well on the training set before you can expect it
  well on the dev/test sets.
C27 Techniques for reducing variance（降低variance的方法）
- Add more training data：most reliable
- Add regularization：reduces variance but increases bias
- Add early stopping
- Feature selection to decrease number/type of input features：数据量大时效果不显著，数据量小时效果显著
- Decrease the model size：The advantage of reducing the model size is reducing your computational cost and thus speeding up how quickly you can train models.
- 总之，降低模型复杂度让模型更加【泛华】，可以降低variance，但会提高bias

Learning curves

C28 Diagnosing bias and variance: Learning curves
- 模型误差评估：
  - 先估计optimal error rate，就是无论模型再怎么优化都会产生的误差，这些误差来源于数据集本身的问题。
  - 再计算training set 和 dev set的errors。
  - training set的error减去optimal error rate等于avoidable bias
  - dev set的error减去training set的error等于variable
- learning curve的含义：验证集的error随训练样本数的变化
  - A learning curve plots your dev set error against the number of training examples. To plot it, you would run your algorithm using different training set sizes.
- 设定 desired error rate：预期错误率
C29 Plotting training error
- 随着训练样本的增加，模型的训练误差会增加（因为cover到的case会更少，产生unavoidable bias的样本增加）
- training set error usually increases as the training set size grows
C30 Interpreting learning curves: High bias
C31 Interpreting learning curves: Other cases
- - the bias is small, but the variance is large.Adding more
    training data will probably help close the gap between dev error and training error.
- - the training error is large, as it is much higher than the desired level of performance. The dev error is also much larger than the training error. Thus, you have significant bias and significant variance. You will have to find a way to reduce both bias and variance in your algorithm.
C32 Plotting learning curves

Comparing to human-level performance

C33 Why we compare to human-level performance
- there are several reasons building an ML system is easier if you are trying to do a task that people can do well:
  - Ease of obtaining data from human labelers
  - Error analysis can draw on human intuition
  - Use human-level performance to estimate the optimal error rate and also set a “desired error rate.”
C34 How to define human-level performance
- 取最优的human-level performance作为 optimal error rate
C35 Surpassing human-level performance
- 即使模型整体平均错误率很低，但某些subset中仍有优化空间。

Training and testing on different distributions

C36 When you should train and test on different distributions
- 当训练集样本很少时，从外部引入了更多新的相似数据集（但不是same distribution），如果分配training set和test set？
  - 错误示范：把training set和test set混在一起，均匀取样。
    - 错误原因：Choose dev and test sets to reflect data you expect to get in the future and want to do well on.
  - 正确做法：test set全从真实的数据集中取（因为是你将来要预测的data distribution），把剩下的数据集和外部数据集合并，均匀产生training set。
C37 How to decide whether to use all your data
- 如果模型capacity足够大，适合用神经网络大模型，则训练集可以包含所有mobile和internet上的cat图片，这样可以增加模型的泛化能力。如果模型很小，则根据attic原则，需要剔除不同分布的data，不然会影响模型识别率。
- no benefit的data需要剔除，如人、风景的图片，把他们作为negative example没有意义。好的训练数据是那些可以帮助识别边界的数据，如似猫非猫的图像。而因为人和猫的差距太大，把很多人的图片放进去对于训练判断器没有太大作用。
C38 How to decide whether to include inconsistent data
- 如果明知样本数据显著来自不同分布，则应该区分进行训练。（如房价预测中的地区因素，不同地区模型肯定不同）
C39 Weighting data
- 如果训练数据差异很多（包含不同分布的data），可以在损失函数中为不同的data设定不同的权重。
- re-weighting的方法只适用在 training额外data 和 dev data 不同且数量差距很大时使用，不然没有太多区别。（应该是nn很牛逼，这点差异模型可以学习）
C40 Generalizing from the training set to the dev set
- 当training set和dev/test set的分布不同时，模型预测不好有3个原因：
  - training set预测效果不好，high avoidable bias
  - training set预测效果好，但对和training set里相同分布的data预测效果不好，high variance
  - Data mismatch：the training set data is a poor match for the dev/test set data. 对training set的所有data预测效果都很好，但是dev/test set 效果不好（因为是不同distribution引起）
- 为了识别variance，可以新增一个 training dev set：从training set中取（一般比正式的training set小）
C41 Identifying Bias, Variance, and Data Mismatch Errors
- 基于C40，列下面这个table
- By understanding which types of error the algorithm suffers from the most, you will be better positioned to decide whether to focus on reducing bias, reducing variance, or reducing data mismatch.
C42 Addressing data mismatch
- 如果模型在training set和training dev set上表现很好，但在dev set上表现很差，就是 data mismatch，怎么办？
  - i. Try to understand what properties of the data differ between the training and the dev set distributions.
  - ii. Try to find more training data that better matches the dev set examples that your algorithm has trouble with.
- 通过error analysis可以case by case地看training set和dev set的区别，比如语音识别中我们发现training set的音频片段大部分在安静的背景环境中录制，而dev set里的音频大部分是在马路上录制。因此需要在training set里增加在马路上录制的音频片段。

-C43 Artificial data synthesis（人工合成data）
- 汽车噪音 + 安静环境的video clip = sound as if it were collected inside a car.
- 人工合成data的问题：
- 如果合成部分的数据不是随机的，如重复用一段汽车噪声合成数据。则model会对这份数据overfit，导致对真实场景下的数据拟合误差很大。
- 这里写图片描述

Debugging inference algorithms

C44 The Optimization Verification test（最优验证测试）
- 当模型预测与实际不符时可能出现两类问题：
  - 计算最大后验概率时往往需要计算非常大的样本，寻求max概率时会用到search algorithm，这并不一定是最准确地。搜索算法。
  - 同时优化目标的近似也未必是最优的。最大后验概率的Score函数估计。
- 区分这两个问题的方法：optimization verification test
  - 如果正确解的score大于output的score，说明score函数估计正确，但搜索优化算法有问题未搜索到最优解
  - 如果正确解的score小于output的score，说明score函数计算有问题，估计错误
C45 General form of Optimization Verification test
- 定义和学习评分函数Score()
- - C46 Reinforcement learning example
  - It is not easy to design good reward functions.
  - 有时没有optimal解，human解也可以