吴恩达课程深度学习笔记

最新推荐文章于 2024-04-29 17:28:08 发布

js_sjtu

最新推荐文章于 2024-04-29 17:28:08 发布

阅读量315

点赞数

分类专栏： Deep Learning

本文链接：https://blog.csdn.net/js_sjtu/article/details/79286712

版权

Deep Learning 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1. Train / Dev / Test sets

Then traditionally you might take all the data you have and carve off some portion of it to be your training set. Some portion of it to be your hold-out cross validation set, and this is sometimes also called the development set. And for brevity I'm just going to call this the dev set, but all of these terms mean roughly the same thing.

吴恩达的以上表述似乎不是很准确，见如下：

First, I think you're mistaken about what the three partitions do. You don't make any choices based on the test data. Your algorithms adjust their parameters based on the training data. You then run them on the validation data to compare your algorithms (and their trained parameters) and decide on a winner. You then run the winner on your test data to give you a forecast of how well it will do in the real world.

You don't validate on the training data because that would overfit your models. You don't stop at the validation step's winner's score because you've iteratively been adjusting things to get a winner in the validation step, and so you need an independent test (that you haven't specifically been adjusting towards) to give you an idea of how well you'll do outside of the current arena.

来自： https://stats.stackexchange.com/questions/9357/why-only-three-partitions-training-validation-test/9364#9364

《 Elements of statistical learning 》中说：

The training set is used to fit the models; the validation set is used to estimate prediction error for model selection; the test set is used for assessment of the generalization error of the final chosen model.

2. 第二课第一周作业介绍里有一句“Recognize that a model without regularization gives you a better accuracy on the training set but nor necessarily on the test set”？？？？错了吧？？？

正则化消除过拟合的代价难道不是 make the accuracy on the training set worse?

3. A well chosen initialization can:

Speed up the convergence of gradient descent
Increase the odds of gradient descent converging to a lower training (and generalization) error

4.There is also some evidence that the ease of learning an identity function--even more than skip connections helping with vanishing gradients--accounts for ResNets' remarkable performance

The skip-connections help to address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function.

6. y=[pc,bx,by,bh,bw,c1,c2,c3] 其中x轴怎么是横向?

7. style cost中，Gij其实就是layer i和j的units值的协方差（未扣除均值偏移的）

js_sjtu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
吴恩达课程深度学习笔记

1. Train / Dev / Test setsThen traditionally you might take all the data you have and carve off some portion of it to be your training set. Some portion of it to be your hold-out cross validation set,...
复制链接

扫一扫