[Pytorch] spatial dropout的实现

本文深入探讨了神经网络中Spatial Dropout技术的原理与应用。Spatial Dropout针对传统Dropout在Embedding层和CNN层应用效果不佳的问题,通过在特定轴上进行统一的神经元失活,保留了特征的空间关联性,有效提升了模型的泛化能力。文章提供了PyTorch实现的Spatial Dropout代码示例,展示了如何在指定的timesteps或embedding方向上进行dropout操作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

dropout是神经网络中一种常用的正则化技术,其通过随机失活神经元元素,降低单元之间的相互依赖关系,从而降低过拟合的风险。实验表明,在Embedding层和CNN层后直接使用常规的dropout策略,效果并不显著,其原因可能:完全随机的dropout的无序性有损于神经元间的空间关联性,从而降低其捕捉特征的能力。因此学者们提出了一种在某些轴上完全dropout的策略,即spatial dropout。

以Embedding层(张量维度为batch*timesteps*embedding)后的dropout为例,一般的dropout是在所有维度元素上的随机选择。而通过spatial dropout,我们可以实现在指定的timesteps或者embedding方向上的统一dropout,前者实现了在某些embedding channel上的drop,而后者实现了在某些token上的drop。

pytorch并未提供直接的spatial dropout接口,本文参照keras中dropout,实现了该接口:

import torch.nn as nn
from itertools import repeat

class SpatialDropout(nn.Module):
    """
    空间dropout,即在指定轴方向上进行dropout,常用于Embedding层和CNN层后
    如对于(batch, timesteps, embedding)的输入,若沿着axis=1则可对embedding的若干channel进行整体dropout
    若沿着axis=2则可对某些token进行整体dropout
    """
    
以下是双流网络的 PyTorch 实现示例: ```python import torch import torch.nn as nn class TwoStreamNet(nn.Module): def __init__(self, num_classes=10): super(TwoStreamNet, self).__init__() # Spatial stream self.spatial_conv1 = nn.Conv2d(3, 96, kernel_size=7, stride=2, padding=2) self.spatial_pool1 = nn.MaxPool2d(kernel_size=3, stride=2) self.spatial_norm1 = nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75) self.spatial_conv2 = nn.Conv2d(96, 256, kernel_size=5, stride=2, padding=1) self.spatial_pool2 = nn.MaxPool2d(kernel_size=3, stride=2) self.spatial_norm2 = nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75) self.spatial_conv3 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1) self.spatial_conv4 = nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1) self.spatial_conv5 = nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1) self.spatial_pool5 = nn.MaxPool2d(kernel_size=3, stride=2) # Temporal stream self.temporal_conv1 = nn.Conv2d(3, 96, kernel_size=7, stride=2, padding=2) self.temporal_pool1 = nn.MaxPool2d(kernel_size=3, stride=2) self.temporal_norm1 = nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75) self.temporal_conv2 = nn.Conv2d(96, 256, kernel_size=5, stride=2, padding=1) self.temporal_pool2 = nn.MaxPool2d(kernel_size=3, stride=2) self.temporal_norm2 = nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75) self.temporal_conv3 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1) self.temporal_conv4 = nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1) self.temporal_conv5 = nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1) self.temporal_pool5 = nn.MaxPool2d(kernel_size=3, stride=2) # Fully connected layers self.fc6 = nn.Linear(4096, 4096) self.dropout6 = nn.Dropout(p=0.5) self.fc7 = nn.Linear(4096, 4096) self.dropout7 = nn.Dropout(p=0.5) self.fc8 = nn.Linear(4096, num_classes) def forward(self, x): # Split the input into spatial and temporal streams spatial = x[:, :, :, 0:224] temporal = x[:, :, :, 224:448] # Spatial stream forward spatial = self.spatial_conv1(spatial) spatial = self.spatial_pool1(spatial) spatial = self.spatial_norm1(spatial) spatial = nn.functional.relu(spatial) spatial = self.spatial_conv2(spatial) spatial = self.spatial_pool2(spatial) spatial = self.spatial_norm2(spatial) spatial = nn.functional.relu(spatial) spatial = self.spatial_conv3(spatial) spatial = nn.functional.relu(spatial) spatial = self.spatial_conv4(spatial) spatial = nn.functional.relu(spatial) spatial = self.spatial_conv5(spatial) spatial = self.spatial_pool5(spatial) spatial = nn.functional.relu(spatial) spatial = spatial.view(spatial.size(0), -1) # Temporal stream forward temporal = self.temporal_conv1(temporal) temporal = self.temporal_pool1(temporal) temporal = self.temporal_norm1(temporal) temporal = nn.functional.relu(temporal) temporal = self.temporal_conv2(temporal) temporal = self.temporal_pool2(temporal) temporal = self.temporal_norm2(temporal) temporal = nn.functional.relu(temporal) temporal = self.temporal_conv3(temporal) temporal = nn.functional.relu(temporal) temporal = self.temporal_conv4(temporal) temporal = nn.functional.relu(temporal) temporal = self.temporal_conv5(temporal) temporal = self.temporal_pool5(temporal) temporal = nn.functional.relu(temporal) temporal = temporal.view(temporal.size(0), -1) # Concatenate the spatial and temporal streams out = torch.cat([spatial, temporal], dim=1) # Fully connected layers out = self.fc6(out) out = nn.functional.relu(out) out = self.dropout6(out) out = self.fc7(out) out = nn.functional.relu(out) out = self.dropout7(out) out = self.fc8(out) return out ``` 这是一个简单的双流网络,其中包含一个空间流和一个时间流,每个流都有自己的卷积和池化层。两个流在最后一个全连接层之前合并。这个模型可以根据需要进行修改和扩展。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值