Written部分答案
2020cs224n作业5比2019版本改变挺多,所以在这里记录一下自己的答案(still in process,仅供参考)。
Problem 1.
(a) We learned in class that recurrent neural architectures can operate over variable length input (i.e., the shape of the model parameters is independent of the length of the input sentence). Is the same true of convolutional architectures? Write one sentence to explain why or why not.
Solution: This is true for conv nets as well because we could apply embeddings and use linear transformation to resize input shapes.
(b) In 1D convolutions, we do padding, i.e. we add some zeros to both sides of our input, so that the kernel sliding over the input can be applied to at least one complete window.
In this case, if we use the kernel size k = 5, what will be the size of the padding (i.e. the additional number of zeros on each side) we need for the 1-dimensional convolution, such that there exists at least one window for all possible values of mword in our dataset? Explain your reasoning.
Solution: Because the kernel size is 5, the length of x p a d d e d ′ x'_{padded} xpadded′ has to be at least 5 to obtain a complete window sliding. For example, if the input is just a letter “a”, then we need to pad 1 zero on each side of this input, plus the “start/end” tokens padded to the beginning and the end of the word, we have “start”+“0”+“a”+“0”+“end”, a size 5 window.
(c ) In step 4, we introduce a Highway Network with x h i g h w a y = x g a t e ⊙ x p r o j + ( 1 − x g a t e ) ⊙ x c o