【吴恩达深度学习】02_week3_quiz Hyperparameter tuning, Batch Normalization, Programming Frameworks

(1)If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?
答案:False
解析:当有很多参数的时候,不知道哪个参数比较重要。
课程中的例子:在 Adam 算法中防止除零操作的 ε 的值,一般为 1 0 − 8 10^{-8} 108,但是和学习率 α 相比,ε 就显得不那么重要了。


(2)Every hyperparameter, if set poorly, can have a huge negative impact on training, and so all hyperparameters are about equally important to tune well. True or False?
答案:False
解析:参考上一题的例子。

(3)During hyperparameter search, whether you try to babysit one model (“panda” strategy) or train a lot of models in parallel (“Caviar”) is largely determined by:
[A]Whether you use batch or mini-batch optimization.
[B]The presence of local minima (and saddle points) in your neural network.
[C]The amount of computational power you can access.
[D]The number of hyperparameters you have to tune.

答案:C
解析:
“熊猫”策略:熊猫的孩子非常少,一次通常只有一个,他们需要花费很多精力去抚养以确保其能存活,我们把一种模型比作一个熊猫宝宝,对其参数好好调节。
“鱼子酱”策略:在交配季节,有些鱼类会产下很多卵,但不对其中任何一个多加照料,只是希望其中一个、或其中一群,能够表现出色。我们把一个模型比作一个卵,设置其参数以后让它自行去训练。
如果算力比较少,你只能跑很少的模型,那就只能使用“熊猫”策略,对单个模型进行好好的调整;如果算力比较大,可以并行的跑好几个模型,那么可以使用“鱼子酱”策略,尝试不同的超参数。
具体参考视频3.3Pandas VS Caviar Hyperparameters tuning in practice: Pandas vs. Caviar )


(4)If you think β \beta β (hyperparameter for momentum) is between on 0.9 and 0.99, which of the following is the recommended way to sample a value for beta?
[A]

r = np.random.rand()
bata = r * 0.09 + 0.9

[B]

r = np.random.rand()
bata = 1 - 10 ** (-r - 1)

[C]

r = np.random.rand()
bata = 1 - 10 ** (-r + 1)

[D]

r = np.random.rand()
bata = r * 0.9 + 0.99

答案:B

解析:采用对数标尺。
r ∈ ( 0 , 1 ) ⇒ r + 1 ∈ ( 1 , 2 ) ⇒ − ( r + 1 ) ∈ ( − 2 , − 1 ) ⇒ 1 0 − r − 1 ∈ ( 0.01 , 0.1 ) ⇒ 1 − 1 0 − r − 1 ∈ ( 0.9 , 0.99 ) r \in (0,1)\Rightarrow r+1 \in (1,2) \Rightarrow -(r+1) \in (-2,-1) \Rightarrow 10^{-r-1} \in (0.01,0.1) \Rightarrow 1-10^{-r-1} \in (0.9,0.99) r(0,1)r+1(1,2)(r+1)(2,1)10r1(0.01,0.1)110r1(0.9,0.99)


(5)Finding good hyperparameter values is very time-consuming. So typically you should do it once at the start of the project, and try to find very good hyperparameters so that you don’t ever have to revisit tuning them again.True or Flase?
答案:False
解析:几乎不可能在一开始就找到最好的参数,并且最好的参数会随着模型、数据、时间等变化而变化。

(6)In batch normalization as presented in the videos, if you apply it on the l th layer of your neural network, what are you normalizing?
[A] b [ l ] b^{[l]} b[l]
[B] W [ l ] W^{[l]} W[l]
[C] a [ l ] a^{[l]} a[l]
[D] z [ l ] z^{[l]} z[l]

答案:D

(7)In the normalization formula z n o r m ( i ) = z [ i ] − μ σ 2 + ϵ z_{norm}^{(i)}=\frac{z^{[i]}-\mu }{\sqrt{\sigma^2+\epsilon }} znorm(i)=σ2+ϵ z[i]μ, why do we use epsilon?
[A]To avoid division by zero.
[B]In case μ \mu μ is too small.
[C]To have a more accurate normalization.
[D]To speed up convergence.

答案:A

(8)Which of the following statements about γ \gamma γ and β \beta β in Batch Norm are true? (Check all that apply)
[A]The optimal values are γ = σ 2 + ϵ \gamma=\sqrt{\sigma^2+\epsilon} γ=σ2+ϵ , and β = μ \beta=\mu β=μ
[B]There is one global value of γ ∈ R \gamma \in \mathbb{R} γR and one global value of β ∈ R \beta \in \mathbb{R} βR for each layer, and applies to all the hidden units in that layer.
[C]They can be learned using Adam, Gradient descent with momentum, or RMSprop, not just with gradient descent.
[D] β \beta β and γ \gamma γ are hyperparameters of the algorithm,which we tune via random sampling.
[E]They set the mean and variance of the linear variable z [ l ] z^{[l]} z[l] of a given layer.

答案:C,E
解析: γ \gamma γ β \beta β的值是需要训练的,而不是定死的,也不是超参数,故A,D错。对于不同的隐藏单元都有不同的 γ \gamma γ β \beta β的值,故B错。

(9)After training a neural network with Batch Norm, at test time, to evaluate the neural network on a new example you should:
[A]Use the most recent mini-batch’s batch μ \mu μ and σ 2 \sigma^2 σ2 to perform the needed normalizations.
[B]perform the needed normalizations,use μ \mu μ and σ 2 \sigma^2 σ2 estimated using an exponentially weighted average across mini-batches seen during training.
[C]Skip the step where you normalize using μ \mu μ and σ 2 \sigma^2 σ2 since a single test example cannot be normalized.
[D]If you implemented Batch Norm on mini-batches of 256 examples, then to evaluate on one test example, duplicate that example 256 times so that you’re working with a mini-batch the same size as during training.

答案:B
解析:见3.7 Batch Norm at test time

(10)Which of these statements about deep learning programming frameworks are true?(Check all that apply)
[A]Even if a project is currently open source, good governance of the project helps ensure that the it remains open even in the long term, rather than become closed or modified to benefit only one company.
[B]Deep learning programming frameworks require cloud-based machines to run.
[C]A programming framework allows you to code up deep learning algorithms with typically fewer lines of code than a lower-level language such as Python.

答案:A,C

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值