深度学习Course5第四周Transformers习题整理

  1. A Transformer Network processes sentences from left to right, one word at a time.
  • False
  • True
  1. Transformer Network methodology is taken from:
  • GRUs and LSTMs
  • Attention Mechanism and RNN style of processing.
  • Attention Mechanism and CNN style of processing.
  • RNN and LSTMs
  1. **What are the key inputs to computing the attention value for each word? **
    在这里插入图片描述
  • The key inputs to computing the attention value for each word are called the query, knowledge, and vector.
  • The key inputs to computing the attention value for each word are called the query, key, and value.
  • The key inputs to computing the attention value for each word are called the quotation, key, and vector.
  • The key inputs to computing the attention value for each word are called the quotation, knowledge, and value.

解析:The key inputs to computing the attention value for each word are called the query, key, and value.

  1. Which of the following correctly represents Attention ?
  • A t t e n t i o n ( Q , K , V ) = s o f t m a x ( Q K T d k ) V Attention(Q,K,V)=softmax(\frac{QK^{T}}{\sqrt{d_k}})V Attention(Q,K,V)=softmax(dk QKT)V
  • A t t e n t i o n ( Q , K , V ) = s o f t m a x ( Q V T d k ) K Attention(Q,K,V)=softmax(\frac{QV^{T}}{\sqrt{d_k}})K Attention(Q,K,V)=softmax(dk QVT)K
  • A t t e n t i o n ( Q , K , V ) = m i n ( Q K T d k ) V Attention(Q,K,V)=min(\frac{QK^{T}}{\sqrt{d_k}})V Attention(Q,K,V)=min(dk QKT)V
  • A t t e n t i o n ( Q , K , V ) = m i n ( Q V T d k ) K Attention(Q,K,V)=min(\frac{QV^{T}}{\sqrt{d_k}})K Attention(Q,K,V)=min(dk QVT)K
  1. Are the following statements true regarding Query (Q), Key (K) and Value (V)?
    Q = interesting questions about the words in a sentence
    K = specific representations of words given a Q
    V = qualities of words given a Q
  • False
  • True

解析:Q = interesting questions about the words in a sentence, K = qualities of words given a Q, V = specific representations of words given a Q

在这里插入图片描述
i here represents the computed attention weight matrix associated with the i t h ith ith “word” in a sentence

  • False
  • True

解析: i i i here represents the computed attention weight matrix associated with the i t h ith ith “head” (sequence).

  1. Following is the architecture within a Transformer Network (without displaying positional encoding and output layers(s)).
    在这里插入图片描述
    What is generated from the output of the Decoder’s first block of Multi-Head Attention?
  • Q
  • K
  • V

解析:This first block’s output is used to generate the Q matrix for the next Multi-Head Attention block.

  1. Following is the architecture within a Transformer Network. (without displaying positional encoding and output layers(s))
    在这里插入图片描述
    What is the output layer(s) of the Decoder ? (Marked Y Y Y, pointed by the independent arrow)
  • Softmax layer
  • Linear layer
  • Linear layer followed by a softmax layer.
  • Softmax layer followed by a linear layer.
  1. Which of the following statements is true about positional encoding? Select all that apply.
  • Positional encoding is important because position and word order are essential in sentence construction of any language.

解析:This is a correct answer, but other options are also correct. To review the concept watch the lecture Transformer Network.

  • Positional encoding uses a combination of sine and cosine equations.

解析This is a correct answer, but other options are also correct. To review the concept watch the lecture Transformer Network.

  • Positional encoding is used in the transformer network and the attention model.
  • Positional encoding provides extra information to our model.
  1. Which of these is a good criterion for a good positionial encoding algorithm?
  • The algorithm should be able to generalize to longer sentences.
  • Distance between any two time-steps should be inconsistent for all sentence lengths.
  • It must be nondeterministic.
  • It should output a common encoding for each time-step (word’s position in a sentence).
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

l8947943

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值