结构化机器学习项目Quiz2

问题描述:

To help you practice strategies for machine learning, in this week we’ll present another scenario and ask how you would act. We think this “simulator” of working in a machine learning project will give a task of what leading a machine learning project could be like!

You are employed by a startup building self-driving cars. You are in charge of detecting road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. As an example, the above image contains a pedestrian crossing sign and red traffic lights
这里写图片描述
Your 100,000 labeled images are taken using the front-facing camera of your car. This is also the distribution of data you care most about doing well on. You think you might be able to get a much larger dataset off the internet, that could be helpful for training even if the distribution of internet data is not the same.


Question 1
You are just getting started on this project. What is the first thing you do? Assume each of the steps below would take about an equal amount of time (a few days).

Spend a few days collecting more data using the front-facing camera of your car, to better understand how much data per unit time you can collect.

Spend a few days checking what is human-level performance for these tasks so that you can get an accurate estimate of Bayes error.

Spend a few days training a basic model and see what mistakes it makes.

Spend a few days getting the internet data, so that you understand better what data is available.
解析:搭建一个深度学习系统的一般步骤如下:

  • 设置开发、测试集和优化指标(确定方向);
  • 快速地建立基本的系统;
  • 使用偏差方差分析、误差分析去确定后面步骤的优先步骤。
    总的来说,如果我们想建立自己的深度学习系统,我们就需要做到:快速的建立自己的基本系统,并进行迭代。而不是想的太多,在一开始就建立一个非常复杂,难以入手的系统。

Question 2
Your goal is to detect road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. You plan to use a deep neural network with ReLU units in the hidden layers.

For the output layer, a softmax activation would be a good choice for the output layer because this is a multi-task learning problem. True/False?

True

False
解析:这里写图片描述
Softmax 激活函数适用于多分类任务,即一个图像可能被分为猫,狗,鸭;而多分类任务是一个图像里面可能既含有猫,又含有狗;二者有所区别


Question 3
You are carrying out error analysis and counting up what errors the algorithm makes. Which of these datasets do you think you should manually go through and carefully examine, one image at a time?

10,000 images on which the algorithm made a mistake

10,000 randomly chosen images

500 images on which the algorithm made a mistake

500 randomly chosen images
解析:收集错误样例,在开发集(测试集)中,获取大约100个(少量)错误标记的例子,并统计其中的错误信息。

After working on the data for several weeks, your team ends up with the following data:

100,000 labeled images taken using the front-facing camera of your car.
900,000 labeled images of roads downloaded from the internet.
Each image’s labels precisely indicate the presence of any specific road signs and traffic signals or combinations of them. For example, y(i) = [1,0,0,1,0]t means the image contains a stop sign and a red traffic light.
Because this is a multi-task learning problem, you need to have all your y(i) vectors fully labeled. If one example is equal to [0,?,1,1,?]tthen the learning algorithm will not be able to use that example. True/False?
True

False
解析:Loss function求和的时候,只对带0、1标签的 j 进行求和,不多为?的标签求和即可。


Question 5
The distribution of data you care about contains images from your car’s front-facing camera; which comes from a different distribution than the images you were able to find and download off the internet. How should you split the dataset into train/dev/test sets?

Choose the training set to be the 900,000 images from the internet along with 20,000 images from your car’s front-facing camera. The 80,000 remaining images will be split equally in dev and test sets.

Choose the training set to be the 900,000 images from the internet along with 80,000 images from your car’s front-facing camera. The 20,000 remaining images will be split equally in dev and test sets.

Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 600,000 for the training set, 200,000 for the dev set and 200,000 for the test set.

Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 980,000 for the training set, 10,000 for the dev set and 10,000 for the test set.
解析:首先要确保dev 和 test set中全为目标图像,即汽车摄像头图像(20,000图像对于其来说已经足够)。其次将从互联网下载的图像全部放进train set,然后再加入部分实际图像。这样做
好处:开发集全部来自手机图片,瞄准目标;
坏处:训练集和开发、测试集来自不同的分布。
但是从长期来看,这样的分布能够给我们带来更好的系统性能。


Question 6
Assume you’ve finally chosen the following split between of the data:
这里写图片描述
You also know that human-level error on the road sign and traffic signals classification task is around 0.5%. Which of the following are True? (Check all that apply).

You have a large avoidable-bias problem because your training error is quite a bit higher than the human-level error.

You have a large variance problem because your model is not generalizing well to data from the same training distribution but that it has never seen before.

Your algorithm overfits the dev set because the error of the dev and test sets are very close.

You have a large data-mismatch problem because your model does a lot better on the training-dev set than on the dev set

You have a large variance problem because your training error is quite higher than the human-level error.
解析:
这里写图片描述
首先训练集误差与人类误差差别太大,存在avoidble bias; 其次train-dev set与dev set的巨大差别显示了存在data mismatch; variance 不大,因为test Set 和train-dev set差别不大


Question 7
Based on table from the previous question, a friend thinks that the training data distribution is much easier than the dev/test distribution. What do you think?

Your friend is right. (I.e., Bayes error for the training data distribution is probably lower than for the dev/test distribution.)

Your friend is wrong. (I.e., Bayes error for the training data distribution is probably higher than for the dev/test distribution.)

There’s insufficient information to tell if your friend is right or wrong.


Question 8
You decide to focus on the dev set and check by hand what are the errors due to. Here is a table summarizing your discoveries:
这里写图片描述
In this table, 4.1%, 8.0%, etc.are a fraction of the total dev set (not just examples your algorithm mislabeled). I.e. about 8.0/14.3 = 56% of your errors are due to foggy pictures.

The results from this analysis implies that the team’s highest priority should be to bring more foggy pictures into the training set so as to address the 8.0% of errors in that category. True/False?
True because it is the largest category of errors. As discussed in lecture, we should prioritize the largest category of error to avoid wasting the team’s time.

True because it is greater than the other error categories added together (8.0 > 4.1+2.2+1.0).

False because this would depend on how easy it is to add this data and how much you think your team thinks it’ll help.

False because data augmentation (synthesizing foggy images by clean/non-foggy images) is more efficient.
解析:不仅仅需要关注它占的比例有多大,还应当关注它是否好实现等


Question 9
You can buy a specially designed windshield wiper that help wipe off some of the raindrops on the front-facing camera. Based on the table from the previous question, which of the following statements do you agree with?

2.2% would be a reasonable estimate of the maximum amount this windshield wiper could improve performance.

2.2% would be a reasonable estimate of the minimum amount this windshield wiper could improve performance.

2.2% would be a reasonable estimate of how much this windshield wiper will improve performance.

2.2% would be a reasonable estimate of how much this windshield wiper could worsen performance in the worst case.
解析:最好情况下完全解决这个问题,提高2.2%


Question 10
You decide to use data augmentation to address foggy images. You find 1,000 pictures of fog off the internet, and “add” them to clean images to synthesize foggy days, like this:
这里写图片描述
Which of the following statements do you agree with?

So long as the synthesized fog looks realistic to the human eye, you can be confident that the synthesized data is accurately capturing the distribution of real foggy images, since human vision is very accurate for the problem you’re solving.

There is little risk of overfitting to the 1,000 pictures of fog so long as you are combing it with a much larger (>>1,000) of clean/non-foggy images.

Adding synthesized images that look like real foggy pictures taken from the front-facing camera of your car to training dataset won’t help the model improve because it will introduce avoidable-bias.


Question 11
After working further on the problem, you’ve decided to correct the incorrectly labeled data on the dev set. Which of these statements do you agree with? (Check all that apply).

You should also correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution

You should correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.

You should not correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution

You should not correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.
解析:test set 中的错误需要修正;对于train set,深度学习算法对训练集中的随机误差具有相当的鲁棒性。只要我们标记出错的例子符合随机误差,如:做标记的人不小心错误,或按错分类键。那么像这种随机误差导致的标记错误,一般来说不管这些误差可能也没有问题。
所以对于这类误差,我们可以不去用大量的时间和精力去做修正,只要数据集足够大,实际误差不会因为这些随机误差有很大的变化。


Question 12
So far your algorithm only recognizes red and green traffic lights. One of your colleagues in the startup is starting to work on recognizing a yellow traffic light. (Some countries call it an orange light rather than a yellow light; we’ll use the US convention of calling it yellow.) Images containing yellow lights are quite rare, and she doesn’t have enough data to build a good model. She hopes you can help her out using transfer learning.

What do you tell your colleague?

She should try using weights pre-trained on your dataset, and fine-tuning further with the yellow-light dataset.

If she has (say) 10,000 images of yellow lights, randomly sample 10,000 images from your dataset and put your and her data together. This prevents your dataset from “swamping” the yellow lights dataset.

You cannot help her because the distribution of data you have is different from hers, and is also lacking the yellow label.

Recommend that she try multi-task learning instead of transfer learning using all the data.
解析:对黄灯的情况(比较少)适当做些加权会好些


Question 13
Another colleague wants to use microphones placed outside the car to better hear if there’re other vehicles around you. For example, if there is a police vehicle behind you, you would be able to hear their siren. However, they don’t have much to train this audio system. How can you help?

Transfer learning from your vision dataset could help your colleague get going faster. Multi-task learning seems significantly less promising.

Multi-task learning from your vision dataset could help your colleague get going faster. Transfer learning seems significantly less promising.

Either transfer learning or multi-task learning could help our colleague get going faster.

Neither transfer learning nor multi-task learning seems promising.
解析:
迁移学习有意义的情况:
- 任务A和任务B有着相同的输入;
- 任务A所拥有的数据要远远大于任务B(对于更有价值的任务B,任务A所拥有的数据要比B大很多);
- 任务A的低层特征学习对任务B有一定的帮助;

多任务学习有意义的情况
- 如果训练的一组任务可以共用低层特征;
- 通常,对于每个任务大量的数据具有很大的相似性;(如,在迁移学习中由任务A“100万数据”迁移到任务B“1000数据”;多任务学习中,任务A1,…,An,每个任务均有1000个数据,合起来就有1000n个数据,共同帮助任务的训练)
- 可以训练一个足够大的神经网络并同时做好所有的任务。
对于本题,一个是计算机视觉,一个是语音识别,二者之间并没有许多相似,既没有相同的输入,也没有可以共用的低层特征


Question 14
To recognize red and green lights, you have been using this approach:

(A) Input an image (x) to a neural network and have it directly learn a mapping to make a prediction as to whether there’s a red light and/or green light (y).
A teammate proposes a different, two-step approach:

(B) In this two-step approach, you would first (i) detect the traffic light in the image (if any), then (ii) determine the color of the illuminated lamp in the traffic light.
Between these two, Approach B is more of an end-to-end approach because it has distinct steps for the input end and the output end. True/False?

True

False
解析:
这里写图片描述
端到端的学习的优缺点:
优点:
- 端到端学习可以直接让数据“说话”;
- 所需手工设计的组件更少。
缺点:
- 需要大量的数据;
- 排除了可能有用的手工设计组件。
应用端到端学习的 Key question:是否有足够的数据能够直接学习到从x映射到y的足够复杂的函数。
这个很明显不是端到端的学习


Question 15
Approach A (in the question above) tends to be more promising than approach B if you have a __ (fill in the blank).

Large training set

Multi-task learning problem.

Large bias problem.

Problem with a high Bayes error.
解析:参考14


参考:http://blog.csdn.net/koala_tree/article/details/78319908

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值