1. | 在实现位置编码时,以下哪一行代码使用正弦函数计算位置编码? _C____ import torch import math class PositionalEncoding(nn.Module): def __init__(self, d_model, max_len=5000): super(PositionalEncoding, self).__init__() self.dropout = nn.Dropout(p=0.1) pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)) pe[:, 0::2] = torch.sin(position * div_term) # (1) pe[:, 1::2] = torch.cos(position * div_term) # (2) pe = pe.unsqueeze(0).transpose(0, 1) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:x.size(0), :] return self.dropout(x) A. B. C. D. |
---|---|
2. | 在Transformer的多头注意力机制中,下列哪一行代码实现了将不同头的注意力输出拼接并线性变换? _D____ import torch import torch.nn as nn class MultiHeadAttention(nn.Module): def __init__(self, d_model, num_heads): super(MultiHeadAttention, self).__init__() self.num_heads = num_heads self.d_model = d_model self.depth = d_model // num_heads self.wq = nn.Linear(d_model, d_model) self.wk = nn.Linear(d_model, d_model) self.wv = nn.Linear(d_model, d_model) self.dense = nn.Linear(d_model, d_model) def split_heads(self, x, batch_size): x = x.view(batch_size, -1, self.num_heads, self.depth) return x.transpose(1, 2) def forward(self, query, key, value, mask=None): batch_size = query.size(0) query = self.wq(query) # (1) key = self.wk(key) # (2) value = self.wv(value) # (3) query = self.split_heads(query, batch_size) # (4) key = self.split_heads(key, batch_size) # (5) value = self.split_heads(value, batch_size) # (6) scores = torch.matmul(query, key.transpose(-2, -1)) / torch.sqrt(torch.tensor(self.depth, dtype=torch.float32)) if mask is not None: scores = scores.masked_fill(mask == 0, -1e9) attention = torch.nn.functional.softmax(scores, dim=-1) x = torch.matmul(attention, value) # (7) x = x.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model) # (8) output = self.dense(x) # (9) return output A. B. C. D. |
3. | 在下列代码片段中,哪一行代码实现了自注意力机制中的缩放点积注意力计算?A import torch import torch.nn.functional as F class ScaledDotProductAttention(nn.Module): def __init__(self, d_model): super(ScaledDotProductAttention, self).__init__() self.d_model = d_model def forward(self, query, key, value, mask=None): scores = torch.matmul(query, key.transpose(-2, -1)) / torch.sqrt(torch.tensor(self.d_model, dtype=torch.float32)) # (1) if mask is not None: scores = scores.masked_fill(mask == 0, -1e9) # (2) attention = F.softmax(scores, dim=-1) # (3) output = torch.matmul(attention, value) # (4) return output, attention A. B. C. D. |
4. | Transformer模型的多头注意力机制(Multi-Head Attention)主要用于:C A. 提高模型的并行计算能力 B. 增强模型的非线性能力 C. 捕捉不同子空间的特征表示 D. 减少模型的参数数量 |
5. | Transformer模型的编码器和解码器都包括以下哪种网络层?CD A. 卷积层 B. 循环层 C. 全连接层 D. 自注意力层 |
6. | Transformer模型中的Layer Normalization(层归一化)通常应用在:CD A. 自注意力层之后 B. 多头注意力层之前 C. 残差连接之前 D. 每个子层中间 |
7. | 以下哪种方法常用于训练Transformer模型?C A. 梯度下降法 B. 动量优化 C. Adam优化器 D. 随机梯度下降 |
8. | 序列到序列模型最初用于解决什么任务?B A. 图像分类 B. 机器翻译 C. 语音识别 D. 文本分类 |
9. | 序列到序列模型中常用的编码器和解码器结构是:B A. 卷积神经网络 B. 循环神经网络 C. 生成对抗网络 D. 自编码器 |
10. | 在机器翻译任务中,序列到序列模型中最常用的编码器和解码器是:B A. 卷积神经网络(CNN) B. 循环神经网络(RNN) C. 生成对抗网络(GAN) D. 图卷积网络(GCN) |
Transformer测试题
于 2024-06-22 15:11:02 首次发布