【吴恩达深度学习】03_week1_quiz Bird recognition in the city of Peacetopia(case study)

(1)

Problem Statement

This example is adapted from a real production application, but with details disguised to protect confidentiality.
在这里插入图片描述
You are a famous researcher in the City of Peacetopia. The people of Peacetopia have a common characteristic: they are afraid of birds. To save them, you have to build an algorithm that will detect any bird flying over Peacetopia and alert the population.
The City Council gives you a dataset of 10,000,000 images of the sky above Peacetopia, taken from the city’s security cameras. they are labelled:

  • y=0: There is no bird on the image.
  • y=1: There is a bird on the image.

Your goal is to build an algorithm able to classify new images taken by security cameras from Peacetopia.
There are a lot of decisions to make:

  • What is the evaluation metric?
  • How do you structure your data into train/dev/test sets?

Metric of success

The City Council tells you the following that they want an algorithm that

  1. Has high accuracy
  2. Runs quickly and takes only a short time to classify a new image.
  3. Can fit in a small amount of memory, so that it can run in a small processor that the city will attach to many different security cameras.

Note: having three evaluation metrics makes it harder for you to quickly choose between two different algorithms, and will slow down the speed with which your team can iterate. True/False?
答案:True
解析:见视频1.3 Single number evaluation metric 的例子

(2)After further discussions, the city narrows down its criteria to:

  • "We need an algorithm that can let us know a bird is flying over Peacetopia as accurately as possible."
  • "We want the trained model to make no more than 10 sec to classify a new image."
  • "We want the model to fin in 10MB if memory."

If you had the three following models, which one would you choose?
[A]

Test AccuracyRuntimeMemory size
97%1 sec3MB

[B]

Test AccuracyRuntimeMemory size
99%13 sec9MB

[C]

Test AccuracyRuntimeMemory size
97%3 sec2MB

[D]

Test AccuracyRuntimeMemory size
98%9 sec9MB

答案:D
解析:Test Accuracy为优化指标,Runtime和Memory size为满足指标。在Runtime和Memory size满足要求的情况下,不在乎其有多好,而关注优化指标Test Accuracy越大越好。

(3)Based on the city’s requests, which of the following would you say is true?
[A]Accuracy is an optimizing metric; running time and memory size are a satisficing metrics.
[B]Accuracy is a satisficing metrics; running time and memory size are an optimizing metric.
[C]Accuracy, running time and memory size are all optimizing metrics because you want to do well on all three.
[D]Accuracy, running time and memory size are all satisficing metrics because you have to do sufficiently well on all three for your system to be acceptable.

答案:A
解析:Test Accuracy为优化指标,Runtime和Memory size为满足指标。Test Accuracy越高越好,Runtime和Memory size只需达到要求即可。

(4)

Structuring your data

Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice?
[A]

TrainDevTest
9,500,000250,000250,000

[B]

TrainDevTest
6,000,0003,000,0001,000,000

[C]

TrainDevTest
3,333,3333,333,3333,333,333

[D]

TrainDevTest
6,000,0001,000,0003,000,000

答案:A

(5)After setting up you train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm.
You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True\False?

答案:False
解析:Adding this data to the training set will change the training set distribution.However, it is not a problem to have different training and dev distribution. On the contrary, it would be very problematic to have different dev and test set distributions.
可能有一些人不理解,因为课程上有个例子,训练集是专业清晰的猫的图像,验证集是模糊的猫的图像,训练集和验证集有不同的分布,效果不好。但是要注意,这里是在原本的基础上再加上这么一些图像,就拿课程上的例子来说,训练集和验证集都是专业清晰的猫的图像,这时候在训练集上额外增加一些模糊的猫的图像,这并不会影响模型的效果,反而还会提高模型的泛化能力。

(6)One member of the City Council knows a little about machine learning, and thinks you should add the 1,000,000 citizens’ data images to the test set. You object because:
[A]The 1,000,000 citizens’ data images do not have a consistent x->y mapping as the rest of the data (similar to the New York City/Detroit housing prices example from lecture).
[B]the test set no longer reflects the distribution of data (security cameras) you most care about.
[C]A bigger test set will slow down the speed of iterating because of the computational expense of evaluating models on the test set.
[D]This would cause the dev and test set distributions to become different. This is a bad idea because you’re not aiming where you want to hit.

答案:B,D

(7)You train a system, and its errors are as follows ( e r r o r = 100 % − A c c u r a c y error=100\%-Accuracy error=100%Accuracy):

Training set error4.0%
Dev set error4.5%

This suggests that one good avenue for improving performance is to train a bigger network so as to drive down the 4.0% training. Do you agree?
[A]Yes, because having 4.0% training error shows you have high bias.
[B]Yes, because this shows your bias is higher than you variance.
[C]No, because this shows your variance is higher than your bias.
[D]No, because there is insufficient information to tell.

答案:D
解析:缺少贝叶斯误差(Bayes error),无法可避免偏差(Avoidable bias)和方差(variance)哪个大。

(8)You ask a few people to label the dataset so as to find out what is human-level performance. You find the following levels of accuracy:

Bird watching expert #10.3% error
Bird watching expert #20.5% error
Normal person #1 (not a Bird watching expert)1.0% error
Normal person #2 (not a Bird watching expert)1.2% error

If you goal is to have “human-level performance” be a proxy (or estimate) for Bayes error, how would you define “human-level performance”?
[A] 0.0% (because it is impossible to do better than this)
[B] 0.3% (accuracy of expert #1)
[C] 0.4% (average of 0.3 and 0.5)
[D] 0.75% (average of all four numbers above)

答案:B
解析:见视频1.10 Understanding human-level performance。

(9)which of the following statements do you agree with?
[A]A learning algorithm’s performance can be better than human-level performance but it can never be better than Bayes error.
[B]A learning algorithm’s performance can never be better than human-level performance but it can be better than Bayes error.
[C]A learning algorithm’s performance can never be better than human-level performance nor better than Bayes error.
[D]A learning algorithm’s performance can be better than human-level performance and better than Bayes error.

答案:A
解析:机器学习算法的表现可能比人类表现好,但绝不可能好于贝叶斯误差(Bayes error),贝叶斯误差是性能无法超过的理论上限。

(10)You find that a team of ornithologists debating and discussing an image gets an even better 0.1% performance, so you define that as “human-level performance.” After working further on your algorithm, you end up with the following:

Human-level performance0.1 %
training set error2.0 %
Dev set error2.1 %

based on the evidence you have, which two of the following four options seem the most promising to try?(Check two options.)
[A]Try decreasing regularization.
[B]Get a bigger training set to reduce variance.
[C]Try increasing regularization.
[D]Train a bigger model to try to do better on the training set.

答案:A,D
解析:由题设条件可得产生了高偏差,A,D是解决高偏差的方法。

(11)You also evaluate your model on the test set, and find the following:

Human-level performance0.1 %
Training set error2.0 %
Dev set error2.1 %
Test set error7.0 %

What does this mean?(Check the two best options.)
[A]You should get a bigger test set.
[B]You have underfit to the dev set.
[C]You should try to get a bigger dev set.
[D]You have overfit to the dev set.

答案:C,D

(12)After working on this project for a year, you finally achieve:

Human-level performance0.10 %
Training set error0.05 %
Dev set error0.05 %

What can you conclude?(Check all that apply)
[A]This is a statistical anomaly (or must be the result of statistical noise) since it should not be possible to surpass human-level performance.
[B]With only 0.09% further progress to make, you should quickly be able to close the remaining gap to 0%.
[C]It is now harder to measure avoidable bias, thus progress will be slower going forward.
[D]If the test set is big enough for the 0.05% error estimate to be accurate, this implies Bayes error is ≤0.05.

答案:C,D
解析:模型的表现可能比人类表现更好,故A错。因为不知道贝叶斯误差,所以不知道还有多少进步空间,故B错。

(13)It turns out Peacetopia has hired one of your competitors to build a system as well. Your system and your competitor both deliver systems with about the same running time and memory size. However, your system has higher accuracy! However, when Peacetopia tries out your and your competitor’s systems, they conclude they actually like your competitor’s system
better because even though you have higher overall accuracy, you have most false negatives (failing to raise an alarm when a bird is in the air). What should you do?
[A]Look at all the models you’ve developed during the development process and find the one with the lowest false negative error rate.
[B]Ask your team to make into account both accuracy and false negative rate during development.
[C]Rethink the appropriate metric for this task, and ask your team to tune to the new metric.
[D]Pick false negative rate as the new metric, and use this new metric to deive all further development.

答案:C

(14)you’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your data is being tested on a new type of data.
在这里插入图片描述

you have only 1,000 images of the new species of bird. The city expects a better system from you within the next 3 months. which of these should you do first?
[A]Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress for your team.
[B]Put the 1,000 images into the training set so as to try to do better on these birds.
[C]Try data augmentation/data synthesis to get more images of the new type of bird.
[D]Add the 1,000 images into your dataset and reshuffle into a new train/dev/test split.

答案:A
解析:通过设计新的评价指标先解决问题,然后1,000张图肯定不够,需要再去收集更多的图片或者数据增强等等,故C错选A

(15)The City Council thinks that having more Cats in the city would help scare off birds. they are so happy with your work on the Bird detector that they also hire you to build a Cat detector. (Wow Cat detectors are just incredibly useful aren’t they) Because of years of working on Cat detectors, you have such a huge dataset of 100,000,000 cat images that training on this data takes about two weeks. Which of the statements do you agree with? (Check all that agree.)
[A]If 100,000,000 examples is enough to build a good enough Cat detector, you might be better of training with just 10,000,000 examples to gain a ≈ 10x improvement in how quickly you can run experiments, even if each model performs a bit worse because it’s trained on less data.
[B]Buying faster computers could speed up your teams’ iteration speed and thus your team’s productivity.
[C]Needing two weeks to train will limit the speed at which you can iterate.
[D]having built a good Bird detector, you should be able to take the same model and hyperparameters and just apply it to the Cat dataset, so there is no need to iterate.

答案:A,B,C
解析:对于不同检测器,模型和超参数不一定完全适用,故D错

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值