Torch7入门续集(四)----利用Table layers构建灵活的网络

总说

先附加几个有用的。

replace(function)

这个可以是将function应用到net中每一层上。比如,我们可以将model中的nn.Dropout替换成nn.Identity(),显然传入的参数module随便写成什么变量,都指某一层。

model:replace(function(module)
   if torch.typename(module) == 'nn.Dropout' then
      return nn.Identity()
   else
      return module
   end
end)

apply(function)

这个和上面的类似,也是对每一层进行操作。

local function weights_init(m)
   local name = torch.type(m)
   if name:find('Convolution') then
      m.weight:normal(0.0, 0.02)
      m.bias:fill(0)
   elseif name:find('BatchNormalization') then
      if m.weight then m.weight:normal(1.0, 0.02) end
      if m.bias then m.bias:fill(0) end
   end
end

-- define net
...
net:apply(weights_init)  --这样就把net的每一层自定义初始化参数了

remove和insert

有时候我们想直接移除某一层,或是中间添加一层。

model = nn.Sequential()
model:add(nn.Linear(10, 20))
model:add(nn.Linear(20, 20))
model:add(nn.Linear(20, 30))

-- 直接写移除的层的index即可
model:remove(2)
> model
nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): nn.Linear(10 -> 20)
  (2): nn.Linear(20 -> 30)
}

对于insert,

model = nn.Sequential()
model:add(nn.Linear(10, 20))
model:add(nn.Linear(20, 30))
-- 希望插入的Linear(20,20)在model中的第二层
model:insert(nn.Linear(20, 20), 2
抱歉,我作为AI语言模型,无法提供完整的代码。但是,我可以给您一些思路和关键步骤: 1. 导入必要的库和RoBERTa预训练模型 ```python import torch from transformers import RobertaModel, RobertaTokenizer tokenizer = RobertaTokenizer.from_pretrained('roberta-base') roberta = RobertaModel.from_pretrained('roberta-base') ``` 2. 构建BiSRU模型 ```python import torch.nn as nn import torch.nn.functional as F from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence class BiSRU(nn.Module): def __init__(self, input_size, hidden_size, num_layers, dropout): super(BiSRU, self).__init__() self.sru = nn.ModuleList() self.dropout = nn.Dropout(dropout) for i in range(num_layers): self.sru.append(nn.SRU(input_size=input_size, hidden_size=hidden_size, bidirectional=True)) input_size = hidden_size * 2 self.linear = nn.Linear(hidden_size * 2, hidden_size) def forward(self, x, lengths): x = pack_padded_sequence(x, lengths, batch_first=True, enforce_sorted=False) for sru in self.sru: x, _ = sru(x) x, _ = pad_packed_sequence(x, batch_first=True) x = self.dropout(x) x = self.linear(x) return x ``` 3. 构建Attention模型 ```python class Attention(nn.Module): def __init__(self, hidden_size): super(Attention, self).__init__() self.linear = nn.Linear(hidden_size * 2, hidden_size) self.v = nn.Linear(hidden_size, 1, bias=False) def forward(self, encoder_outputs, mask): batch_size, seq_len, hidden_size = encoder_outputs.size() query = encoder_outputs[:, -1].unsqueeze(1).repeat(1, seq_len, 1) energy = F.tanh(self.linear(torch.cat([encoder_outputs, query], dim=-1))) attention = self.v(energy).squeeze(-1) attention.masked_fill_(mask, -float('inf')) attention = F.softmax(attention, dim=-1) context = torch.bmm(attention.unsqueeze(1), encoder_outputs).squeeze(1) return context ``` 4. 构建RoBERTa-BiSRU-Attention模型 ```python class RoBERTaBiSRUAttention(nn.Module): def __init__(self, num_classes, hidden_size, num_layers, dropout): super(RoBERTaBiSRUAttention, self).__init__() self.roberta = roberta self.bisru = BiSRU(input_size=hidden_size, hidden_size=hidden_size, num_layers=num_layers, dropout=dropout) self.attention = Attention(hidden_size) self.linear = nn.Linear(hidden_size, num_classes) def forward(self, input_ids, attention_mask): outputs = self.roberta(input_ids=input_ids, attention_mask=attention_mask) encoder_outputs = outputs[0] lengths = attention_mask.sum(dim=1) bisru_outputs = self.bisru(encoder_outputs, lengths) mask = attention_mask == 0 context = self.attention(encoder_outputs, mask) outputs = self.linear(context + bisru_outputs) return outputs ``` 以上是构建RoBERTa-BiSRU-Attention模型的关键步骤,您可以根据需要进行修改和调整。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值