深度学习Course5第四周Transformers习题整理

最新推荐文章于 2024-05-26 23:16:00 发布

l8947943

最新推荐文章于 2024-05-26 23:16:00 发布

阅读量3.7k

点赞数

分类专栏： deeplearning_ai 文章标签：神经网络人工智能深度学习

本文链接：https://blog.csdn.net/l8947943/article/details/126919395

版权

22 篇文章 1 订阅

订阅专栏

A Transformer Network processes sentences from left to right, one word at a time.

The key inputs to computing the attention value for each word are called the query, knowledge, and vector.
The key inputs to computing the attention value for each word are called the query, key, and value.
The key inputs to computing the attention value for each word are called the quotation, key, and vector.
The key inputs to computing the attention value for each word are called the quotation, knowledge, and value.

解析：The key inputs to computing the attention value for each word are called the query, key, and value.

Are the following statements true regarding Query (Q), Key (K) and Value (V)?
Q = interesting questions about the words in a sentence
K = specific representations of words given a Q
V = qualities of words given a Q

解析：Q = interesting questions about the words in a sentence, K = qualities of words given a Q, V = specific representations of words given a Q

在这里插入图片描述
i here represents the computed attention weight matrix associated with the $i t h$ “word” in a sentence

解析： $i$ here represents the computed attention weight matrix associated with the $i t h$ “head” (sequence).

Following is the architecture within a Transformer Network (without displaying positional encoding and output layers(s)).

What is generated from the output of the Decoder’s first block of Multi-Head Attention?

解析：This first block’s output is used to generate the Q matrix for the next Multi-Head Attention block.

Following is the architecture within a Transformer Network. (without displaying positional encoding and output layers(s))

What is the output layer(s) of the Decoder ? (Marked $Y$ , pointed by the independent arrow)

Which of the following statements is true about positional encoding? Select all that apply.

Positional encoding is important because position and word order are essential in sentence construction of any language.

解析：This is a correct answer, but other options are also correct. To review the concept watch the lecture Transformer Network.

解析This is a correct answer, but other options are also correct. To review the concept watch the lecture Transformer Network.

Which of these is a good criterion for a good positionial encoding algorithm?

The algorithm should be able to generalize to longer sentences.
Distance between any two time-steps should be inconsistent for all sentence lengths.
It must be nondeterministic.
It should output a common encoding for each time-step (word’s position in a sentence).

关注