吴恩达课程深度学习笔记

1. Train / Dev / Test sets

Then traditionally you might take all the data you have and carve off some portion of it to be your training set. Some portion of it to be your hold-out cross validation set, and this is sometimes also called the development set. And for brevity I'm just going to call this the dev set, but all of these terms mean roughly the same thing. 

吴恩达的以上表述似乎不是很准确,见如下:

First, I think you're mistaken about what the three partitions do. You don't make any choices based on the test data. Your algorithms adjust their parameters based on the training data. You then run them on the validation data to compare your algorithms (and their trained parameters) and decide on a winner. You then run the winner on your test data to give you a forecast of how well it will do in the real world.

You don't validate on the training data because that would overfit your models. You don't stop at the validation step's winner's score because you've iteratively been adjusting things to get a winner in the validation step, and so you need an independent test (that you haven't specifically been adjusting towards) to give you an idea of how well you'll do outside of the current arena.

来自: https://stats.stackexchange.com/questions/9357/why-only-three-partitions-training-validation-test/9364#9364

 Elements of statistical learning 》中说:

The training set is used to fit the models; the validation set is used to estimate prediction error for model selection; the test set is used for assessment of the generalization error of the final chosen model. 

2. 第二课第一周作业介绍里有一句“Recognize that a model without regularization gives you a better accuracy on the training set but nor necessarily on the test set”????错了吧???

正则化消除过拟合的代价难道不是 make the accuracy on the training set worse?

3. A well chosen initialization can:

  • Speed up the convergence of gradient descent
  • Increase the odds of gradient descent converging to a lower training (and generalization) error

4.There is also some evidence that the ease of learning an identity function--even more than skip connections helping with vanishing gradients--accounts for ResNets' remarkable performance

5.

The skip-connections help to address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function.

6. y=[pc,bx,by,bh,bw,c1,c2,c3] 其中x轴怎么是横向?

7. style cost中,Gij其实就是layer i和j的units值的协方差(未扣除均值偏移的)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值