cs224n A5 Updated Sol

最新推荐文章于 2024-04-17 19:21:59 发布

Shu�

最新推荐文章于 2024-04-17 19:21:59 发布

阅读量415

点赞数 1

分类专栏： NLP 文章标签：算法矩阵线性代数

本文链接：https://blog.csdn.net/scripteddreams/article/details/108446931

版权

这篇博客记录了2020年cs224n作业5中关于卷积神经网络和Transformer的部分答案。内容涉及卷积架构的变长输入处理、1D卷积的填充计算、Highway Network在字符嵌入中的作用以及Transformer相比LSTM-attention模型的优势。

摘要由CSDN通过智能技术生成

Written部分答案

2020cs224n作业5比2019版本改变挺多，所以在这里记录一下自己的答案（still in process，仅供参考）。

Problem 1.
(a) We learned in class that recurrent neural architectures can operate over variable length input (i.e., the shape of the model parameters is independent of the length of the input sentence). Is the same true of convolutional architectures? Write one sentence to explain why or why not.

Solution: This is true for conv nets as well because we could apply embeddings and use linear transformation to resize input shapes.

(b) In 1D convolutions, we do padding, i.e. we add some zeros to both sides of our input, so that the kernel sliding over the input can be applied to at least one complete window.
In this case, if we use the kernel size k = 5, what will be the size of the padding (i.e. the additional number of zeros on each side) we need for the 1-dimensional convolution, such that there exists at least one window for all possible values of mword in our dataset? Explain your reasoning.

Solution: Because the kernel size is 5, the length of $x'_{padded}$ has to be at least 5 to obtain a complete window sliding. For example, if the input is just a letter “a”, then we need to pad 1 zero on each side of this input, plus the “start/end” tokens padded to the beginning and the end of the word, we have “start”+“0”+“a”+“0”+“end”, a size 5 window.

(c ) In step 4, we introduce a Highway Network with

最低0.47元/天解锁文章

Shu�

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
cs224n A5 Updated Sol

Written部分答案2020cs224n作业5比2019版本改变挺多，所以在这里记录一下自己的答案（still in process，仅供参考）。Problem 1.(a) We learned in class that recurrent neural architectures can operate over variable length input (i.e., the shape of the model parameters is independent of the length
复制链接

扫一扫

专栏目录