attention 代码学习笔记

最新推荐文章于 2023-09-08 08:43:46 发布

果子果实

最新推荐文章于 2023-09-08 08:43:46 发布

阅读量205

点赞数

分类专栏： pytorch学习笔记 attention学习笔记

本文链接：https://blog.csdn.net/k411797905/article/details/102289809

版权

pytorch学习笔记同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

attention学习笔记

3 篇文章 0 订阅

订阅专栏

http://nlp.seas.harvard.edu/2018/04/03/attention.html#applications-of-attention-in-our-model

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import math, copy, time
from torch.autograd import Variable
import matplotlib.pyplot as plt
import seaborn
seaborn.set_context(context="talk")
%matplotlib inline

def attention(query, key, value, mask=None, dropout=None):
    "Compute 'Scaled Dot Product Attention'"
    d_k = query.size(-1)
    scores = torch.matmul(query, key.transpose(-2, -1)) \
             / math.sqrt(d_k)
    if mask is not None:
        scores = scores.masked_fill(mask == 0, -1e9)
    p_attn = F.softmax(scores, dim = -1)
    if dropout is not None:
        p_attn = dropout(p_attn)
    return torch.matmul(p_attn, value), p_attn


class MultiHeadedAttention(nn.Module):
    def __init__(self, h, d_model, dropout=0.1):
        "Take in model size and number of heads."
        super(MultiHeadedAttention, self).__init__()
        assert d_model % h == 0
        # We assume d_v always equals d_k
        self.d_k = d_model // h
        self.h = h
        self.linears = clones(nn.Linear(d_model, d_model), 4)#复制4个同结构的linear层
        self.attn = None
        self.dropout = nn.Dropout(p=dropout)
        
    def forward(self, query, key, value, mask=None):
        "Implements Figure 2"
        if mask is not None:
            # Same mask applied to all h heads.
            mask = mask.unsqueeze(1)
        nbatches = query.size(0)#nbatches请求的总数
        
        # 1) Do all the linear projections in batch from d_model => h x d_k 
        query, key, value = \
            [l(x).view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
             for l, x in zip(self.linears, (query, key, value))]
        
        # 2) Apply attention on all the projected vectors in batch. 
        x, self.attn = attention(query, key, value, mask=mask, 
                                 dropout=self.dropout)
        
        # 3) "Concat" using a view and apply a final linear. 
        x = x.transpose(1, 2).contiguous() \
             .view(nbatches, -1, self.h * self.d_k)
        return self.linears[-1](x)

class torch.nn.Linear(in_features, out_features, bias=True)
对输入数据做线性变换：y=Ax+b

果子果实

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
attention 代码学习笔记

http://nlp.seas.harvard.edu/2018/04/03/attention.html#applications-of-attention-in-our-modelimport numpy as npimport torchimport torch.nn as nnimport torch.nn.functional as Fimport math, copy, ti...
复制链接

扫一扫

专栏目录