前言
A3作業讓你學會建立neural dependency parser的同時也能熟悉Pytorch的用法。
Written part是關於Adam和Dropout的解答與思考,這部分教授在課上解釋的比較少,但屬於neural network的重點之一,建議閱讀相關文獻加深這部分的理解。
Coding part是關於運用wrriten part的optimizer trick建立一個完整的simple neural net,並進行模型訓練。
題目詳情
– Written Part –
#1. Machine Learning & Neural Networks (8 points)
Answer:
( a )
i. Using m updates the gradient by multiplying it by α(1-β) times, reducing the gradient even further than SGD.
ii. v will get larger updates since its calculation contains the power of the gradients. If v is larger than 1, the updated v will be larger; if v is smaller than 1, the updated v will become smaller. This can help with learning by avoiding the learning rate being too large(exploding) or too small(vanishing) through the calculation of the division (√v).
( b )
i. γ = 1 1 − p d r o p γ = \frac{1}{1-p_{drop}} γ=1−pdrop1.
Since
h d r o p = γ d ⊙ h h_{drop} = γd⊙h hdrop=γd⊙h
∵ h d r o p = γ ( 1 − p d r o p ) ⊙ h = h h_{drop}=γ(1-p_{drop})⊙h=h hdrop=γ(1−pdrop)⊙h=h
∴ γ ( 1 − p d r o p ) = 1 γ(1-p_{drop})=1 γ(1