Transformer - MultiHeadSelfAttention 结构
文章目录
前言
作为基础知识再深挖一下 MultiHeadSelfAttention 具体意义(面试也可能手撕)欢迎指教~
import torch
import torch.nn as nn
import torch.Functional as F
Class MultiHeadSelfAttention(nn.Module):
def __init__(self, num_heads, heads_dim, qkv_dim):
super().__init__()
## 定义变量
self.num_heads = num_heads
self.heads_dim = heads_dim
self.dim