《machine learning yearning》(Done)

学习目的

这本书是B老师上周末让我读的书,是NG对机器学习实践中遇到各种问题处理方法的总结

学习周期

6.25 C1-C12
6.26 C1-C12
6.27 C13-C17
6.28 C18-C27
6.29 C28-C32
7.2 C33-C35
7.3 C36
7.4 C37-C38
7.7 C39-C43
7.10 C44-C46

学习心得

  • 这本书比较薄,仅89pages,主要是NG在工业应用中模型调优的一些经验积累。
    • 其中error case的分析角度比较新颖,同时结合learning curve直观地把模型误差拆成了 bias、variance,以及当模型出现这些问题时原因及其possible solution。
    • 另一个比较有趣的观点是当training set 和 test set不一致时该怎么办,记住一句话:Choose dev and test sets to reflect data you expect to get in the future and want to do well on.

学习笔记

Basic Error Analysis

  • C13 Build your first system quickly, then iterate
  • C14 Error analysis: Look at dev set examples to evaluate ideas
    这里写图片描述
  • C15 Evaluating multiple ideas in parallel during error analysis
    • 通过分析误分类的case,对mislabel的案例进行分类统计(思考是什么原因导致误分类的),可以大致评判出模型对各种误分类场景的受影响程度,进而评估对不同模型优化方向的探索权重。
      这里写图片描述
  • C16 Cleaning up mislabeled dev and test set examples

    • label quality
    • 验证集人工打标错误:When I say “mislabeled” here, I mean that the pictures were already mislabeled by a human labeler even before the algorithm encountered it.
    • 若误打标的样本数在误分类的占比较低可忽略,较高则worthwhile
    • 验证集和测试集需要做同样的处理:Whatever process you apply to fixing dev set labels, remember to apply it to the test set labels too so that your dev and test sets continue to be drawn from the same distribution.
  • C17 If you have a large dev set, split it into two subsets, only one of which you look at

    • 把10%的数据集拿出来用目测的方法看模型误分类的好坏,度量error rates,然后基于小数据集优化模型,迭代到大模型上。如果在小模型上拟合得好而大模型上拟合的不好,则说明小数据集的模型过拟合,换一个小数据集。
  • C18 How big should the Eyeball and Blackbox dev sets be?

  • C19 Takeaways: Basic error analysis
    • If your dev set is not big enough to split this way, just use an Eyeball dev set for manual error analysis, model selection, and hyperparameter tuning.

Bias and Variance

  • C20 Bias and Variance: The two big sources of error
    • bias 表示模型在训练集上的误差
    • variance 表示模型在某测试集上误差的溢出部分
    • 这里写图片描述
    • 不同的优化方法优化的对象不同 bias or variance
  • C21 Examples of Bias and Variance

    • High variance:training error(bias)很小但dev error(variance+bias)很大,模型 overfitting
    • High bias:training error(bias)和dev error很接近且较大,模型 underfitting
    • High bias and high variance:同时欠拟合且过拟合?
    • 这里写图片描述
    • 优秀的模型应该是 low bias and low variance
  • C22 Comparing to the optimal error rate

    • optimal error rate(Bayes error rate):由于数据误差和噪声导致模型最优的拟合度/误差率,是最优错误率。
    • Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias
    • Total dev set error = Optimal error rate (“unavoidable bias”) + Avoidable bias + Variance
    • 如何估计optimal error rate:肉眼看
    • There are very different techniques that you should apply depending on whether your project’s current problem is high (avoidable) bias or high variance.
  • C23 Addressing Bias and Variance

    • 这里写图片描述
    • So if you are using neural networks, the academic literature can be a great source of inspiration.
    • But the results of trying new architectures are less predictable than the simple formula of increasing the model size and adding data.
    • 增加模型复杂度(adding layers/neurons)可以有效地减小 bias,但同时会导致overfitting的风险。此时需要一个合适的正则化方法(regularization method),则可以放心地增加model size。
  • C24 Bias vs. Variance tradeoff

    • 增加模型复杂度会减少bias,提升variance;增加正则化约束会增加bias,降低variance。
  • C25 Techniques for reducing avoidable bias(降低可避免误差的方法)

  • C26 Error analysis on the training set

    • Your algorithm must perform well on the training set before you can expect it
      well on the dev/test sets.
  • C27 Techniques for reducing variance(降低variance的方法)

    • Add more training data:most reliable
    • Add regularization:reduces variance but increases bias
    • Add early stopping
    • Feature selection to decrease number/type of input features:数据量大时效果不显著,数据量小时效果显著
    • 这里写图片描述
    • Decrease the model size:The advantage of reducing the model size is reducing your computational cost and thus speeding up how quickly you can train models.
    • 总之,降低模型复杂度让模型更加【泛华】,可以降低variance,但会提高bias

Learning curves

  • C28 Diagnosing bias and variance: Learning curves

    • 模型误差评估:
      • 先估计optimal error rate,就是无论模型再怎么优化都会产生的误差,这些误差来源于数据集本身的问题。
      • 再计算training set 和 dev set的errors。
      • training set的error减去optimal error rate等于avoidable bias
      • dev set的error减去training set的error等于variable
    • learning curve的含义:验证集的error随训练样本数的变化
      • A learning curve plots your dev set error against the number of training examples. To plot it, you would run your algorithm using different training set sizes.
    • 设定 desired error rate:预期错误率
    • 这里写图片描述
  • C29 Plotting training error

    • 随着训练样本的增加,模型的训练误差会增加(因为cover到的case会更少,产生unavoidable bias的样本增加)
    • training set error usually increases as the training set size grows
    • 这里写图片描述
  • C30 Interpreting learning curves: High bias

    • 这里写图片描述
  • C31 Interpreting learning curves: Other cases

    • 这里写图片描述
      • the bias is small, but the variance is large.Adding more
        training data will probably help close the gap between dev error and training error.
    • 这里写图片描述
      • the training error is large, as it is much higher than the desired level of performance. The dev error is also much larger than the training error. Thus, you have significant bias and significant variance. You will have to find a way to reduce both bias and variance in your algorithm.
  • C32 Plotting learning curves

Comparing to human-level performance

  • C33 Why we compare to human-level performance

    • there are several reasons building an ML system is easier if you are trying to do a task that people can do well:
      • Ease of obtaining data from human labelers
      • Error analysis can draw on human intuition
      • Use human-level performance to estimate the optimal error rate and also set a “desired error rate.”
  • C34 How to define human-level performance

    • 取最优的human-level performance作为 optimal error rate
  • C35 Surpassing human-level performance

    • 即使模型整体平均错误率很低,但某些subset中仍有优化空间。

Training and testing on different distributions

  • C36 When you should train and test on different distributions

    • 当训练集样本很少时,从外部引入了更多新的相似数据集(但不是same distribution),如果分配training set和test set?
      • 错误示范:把training set和test set混在一起,均匀取样。
        • 错误原因:Choose dev and test sets to reflect data you expect to get in the future and want to do well on.
      • 正确做法:test set全从真实的数据集中取(因为是你将来要预测的data distribution),把剩下的数据集和外部数据集合并,均匀产生training set。
  • C37 How to decide whether to use all your data

    • 如果模型capacity足够大,适合用神经网络大模型,则训练集可以包含所有mobile和internet上的cat图片,这样可以增加模型的泛化能力。如果模型很小,则根据attic原则,需要剔除不同分布的data,不然会影响模型识别率。
    • no benefit的data需要剔除,如人、风景的图片,把他们作为negative example没有意义。好的训练数据是那些可以帮助识别边界的数据,如似猫非猫的图像。而因为人和猫的差距太大,把很多人的图片放进去对于训练判断器没有太大作用。
  • C38 How to decide whether to include inconsistent data

    • 如果明知样本数据显著来自不同分布,则应该区分进行训练。(如房价预测中的地区因素,不同地区模型肯定不同)
  • C39 Weighting data

    • 如果训练数据差异很多(包含不同分布的data),可以在损失函数中为不同的data设定不同的权重。
    • 这里写图片描述
    • re-weighting的方法只适用在 training额外data 和 dev data 不同且数量差距很大时使用,不然没有太多区别。(应该是nn很牛逼,这点差异模型可以学习)
  • C40 Generalizing from the training set to the dev set

    • 当training set和dev/test set的分布不同时,模型预测不好有3个原因:
      • training set预测效果不好,high avoidable bias
      • training set预测效果好,但对和training set里相同分布的data预测效果不好,high variance
      • Data mismatch:the training set data is a poor match for the dev/test set data. 对training set的所有data预测效果都很好,但是dev/test set 效果不好(因为是不同distribution引起)
    • 为了识别variance,可以新增一个 training dev set:从training set中取(一般比正式的training set小)
  • C41 Identifying Bias, Variance, and Data Mismatch Errors

    • 基于C40,列下面这个table
    • 这里写图片描述
    • By understanding which types of error the algorithm suffers from the most, you will be better positioned to decide whether to focus on reducing bias, reducing variance, or reducing data mismatch.
  • C42 Addressing data mismatch

    • 如果模型在training set和training dev set上表现很好,但在dev set上表现很差,就是 data mismatch,怎么办?
      • i. Try to understand what properties of the data differ between the training and the dev set distributions.
      • ii. Try to find more training data that better matches the dev set examples that your algorithm has trouble with.
    • 通过error analysis可以case by case地看training set和dev set的区别,比如语音识别中我们发现training set的音频片段大部分在安静的背景环境中录制,而dev set里的音频大部分是在马路上录制。因此需要在training set里增加在马路上录制的音频片段。

-C43 Artificial data synthesis(人工合成data)
- 汽车噪音 + 安静环境的video clip = sound as if it were collected inside a car.
- 人工合成data的问题:
- 如果合成部分的数据不是随机的,如重复用一段汽车噪声合成数据。则model会对这份数据overfit,导致对真实场景下的数据拟合误差很大。
- 这里写图片描述

Debugging inference algorithms

  • C44 The Optimization Verification test(最优验证测试)

    • 当模型预测与实际不符时可能出现两类问题:
      • 计算最大后验概率时往往需要计算非常大的样本,寻求max概率时会用到search algorithm,这并不一定是最准确地。搜索算法。
      • 同时优化目标的近似也未必是最优的。最大后验概率的Score函数估计。
      • 这里写图片描述
    • 区分这两个问题的方法:optimization verification test
      • 如果正确解的score大于output的score,说明score函数估计正确,但搜索优化算法有问题未搜索到最优解
      • 如果正确解的score小于output的score,说明score函数计算有问题,估计错误
        • 这里写图片描述
  • C45 General form of Optimization Verification test

    • 定义和学习评分函数Score()
    • 这里写图片描述

      • C46 Reinforcement learning example
      • It is not easy to design good reward functions.
      • 有时没有optimal解,human解也可以
      • 这里写图片描述
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Table of Contents (draft) Why Machine Learning Strategy 4 ........................................................................................... How to use this book to help your team 6 ................................................................................ Prerequisites and Notation 7 .................................................................................................... Scale drives machine learning progress 8 ................................................................................ Your development and test sets 11 ............................................................................................ Your dev and test sets should come from the same distribution 13 ........................................ How large do the dev/test sets need to be? 15 .......................................................................... Establish a single-number evaluation metric for your team to optimize 16 ........................... Optimizing and satisficing metrics 18 ..................................................................................... Having a dev set and metric speeds up iterations 20 ............................................................... When to change dev/test sets and metrics 21 .......................................................................... Takeaways: Setting up development and test sets 23 .............................................................. Build your first system quickly, then iterate 25 ........................................................................ Error analysis: Look at dev set examples to evaluate ideas 26 ................................................ Evaluate multiple ideas in parallel during error analysis 28 ................................................... If you have a large dev set, split it into two subsets, only one of which you look at 30 ........... How big should the Eyeball and Blackbox dev sets be? 32 ...................................................... Takeaways: Basic error analysis 34 .......................................................................................... Bias and Variance: The two big sources of error 36 ................................................................. Examples of Bias and Variance 38 ............................................................................................ Comparing to the optimal error rate 39 ................................................................................... Addressing Bias and Variance 41 .............................................................................................. Bias vs. Variance tradeoff 42 ..................................................................................................... Techniques for reducing avoidable bias 43 .............................................................................. Techniques for reducing Variance 44 ....................................................................................... Error analysis on the training set 46 ........................................................................................ Diagnosing bias and variance: Learning curves 48 ................................................................. Plotting training error 50 .......................................................................................................... Interpreting learning curves: High bias 51 ............................................................................... Interpreting learning curves: Other cases 53 .......................................................................... Plotting learning curves 55 ....................................................................................................... Why we compare to human-level performance 58 .................................................................. How to define human-level performance 60 ........................................................................... Surpassing human-level performance 61 ................................................................................ Why train and test on different distributions 63 ...................................................................... Page!2 Machine Learning Yearning-Draft V0.5 Andrew NgWhether to use all your data 65 ................................................................................................ Whether to include inconsistent data 67 .................................................................................. Weighting data 68 .................................................................................................................... Generalizing from the training set to the dev set 69 ................................................................ Addressing Bias and Variance 71 ............................................................................................. Addressing data mismatch 72 ................................................................................................... Artificial data synthesis 73 ........................................................................................................ The Optimization Verification test 76 ...................................................................................... General form of Optimization Verification test 78 ................................................................... Reinforcement learning example 79 ......................................................................................... The rise of end-to-end learning 82 ........................................................................................... More end-to-end learning examples 84 .................................................................................. Pros and cons of end-to-end learning 86 ................................................................................ Learned sub-components 88 .................................................................................................... Directly learning rich outputs 89 .............................................................................................. Error Analysis by Parts 93 ....................................................................................................... Beyond supervised learning: What’s next? 94 ......................................................................... Building a superhero team - Get your teammates to read this 96 ........................................... Big picture 98 ............................................................................................................................ Credits 99
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值