数学不太好,做的时候有点瞎蒙,做完整理一下
先说𝛿(3),是输出层方向计算的误差项,输出层三层,所以𝛿(3) 应该是 31 。
选项是通过𝛿(3)和a(2)相乘,然后加上Δ(2) ,那么因为Δ(2)对应前一层的误差项,所以和𝛿(3)一样,也是n1,两个向量相加,要保持维度相同,所以𝛿(3)和a(2)相乘时,a转置,𝛿不用。
这题比较简单,可能有一点要注意,选项B是(15:38)不对,是因为Matlab索引是从1开始
略
B选项:Gradient checking will still be useful with advanced optimization methods, as they depend on computing the gradient at given parameter settings. The difference is they use the gradient values in more sophisticated ways than gradient descent. 都适用
D选项:A large value of \lambdaλ can be quite detrimental. If you set it too high, then the network will be underfit to the training data and give poor predictions on both training data and new, unseen test data. 𝜆太大会拟合不足