self-attention 和 convolutional layer 之间的关系

自注意网络,这个点,现在在NLP方面特别火,现在自注意力网络也有大量的应用在CV上面,尤其在结合CNN。那么这个self-attention和convolutionl layer之间到底有什么关系呢?是两个独立的模块还是两个可以转换关系。最近我看到了这篇论文。

On the Relationship between Self-Attention and Convolutional Layers 这个论文就是讲解了这个关系。我觉得论文挺好的,所以分享给大家。如果大家懒得看全文,只要看下面这一段就可以了。

这一段直接解释了两者之间的关系。

好的,以下是一个用于故障诊断分类任务的cnn-attention模型的示例代码: ```python import torch import torch.nn as nn import torch.nn.functional as F class CNNAttention(nn.Module): def __init__(self, num_classes, num_filters=32, filter_sizes=[3,4,5], dropout_rate=0.5): super(CNNAttention, self).__init__() self.num_filters = num_filters self.filter_sizes = filter_sizes self.num_classes = num_classes # CNN layers self.convs = nn.ModuleList([ nn.Conv2d(1, num_filters, (k, 300)) for k in filter_sizes ]) # Attention mechanism self.attention_weight = nn.Parameter(torch.Tensor(num_filters, num_filters)) nn.init.xavier_uniform_(self.attention_weight) # Fully connected layer self.fc = nn.Linear(num_filters, num_classes) # Dropout self.dropout = nn.Dropout(p=dropout_rate) def forward(self, x): # x shape: (batch_size, seq_len, embedding_dim) x = x.unsqueeze(1) # x shape: (batch_size, 1, seq_len, embedding_dim) # Convolutional layers conv_outputs = [] for conv in self.convs: conv_output = F.relu(conv(x)).squeeze(3) # conv_output shape: (batch_size, num_filters, seq_len - filter_size + 1) conv_outputs.append(conv_output) # Attention mechanism conv_outputs = torch.stack(conv_outputs, dim=1) # conv_outputs shape: (batch_size, num_filters, num_conv_filter_sizes, seq_len - max(filter_sizes) + 1) attention_scores = torch.bmm(conv_outputs.transpose(1,2), self.attention_weight.unsqueeze(0).repeat(conv_outputs.shape[0],1,1)) # attention_scores shape: (batch_size, num_conv_filter_sizes, num_filters) attention_scores = F.softmax(attention_scores, dim=-1) # attention_scores shape: (batch_size, num_conv_filter_sizes, num_filters) conv_outputs = torch.bmm(attention_scores, conv_outputs) # conv_outputs shape: (batch_size, num_filters, seq_len - max(filter_sizes) + 1) # Max pooling pooled = F.max_pool1d(conv_outputs, conv_outputs.size(2)).squeeze(2) # pooled shape: (batch_size, num_filters) # Dropout pooled = self.dropout(pooled) # Fully connected layer logits = self.fc(pooled) # logits shape: (batch_size, num_classes) return logits ``` 这个模型使用了卷积神经网络(CNN)和注意力机制(Attention)来提取文本特征,然后通过全连接层进行分类。其中,CNN用于提取局部特征,Attention用于加强重要特征的权重,从而更好地捕捉文本中的关键信息。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值