transformer

CLASS torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6,
 num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, 
 activation=<function relu>, custom_encoder=None, custom_decoder=None,
 layer_norm_eps=1e-05, batch_first=False, norm_first=False, device=None,
 dtype=None)

Parameters

  • d_model – the number of expected features in the encoder/decoder inputs (default=512).
  • nhead – the number of heads in the multiheadattention models (default=8).
  • num_encoder_layers – the number of sub-encoder-layers in the encoder (default=6).
  • num_decoder_layers – the number of sub-decoder-layers in the decoder (default=6).
  • dim_feedforward – the dimension of the feedforward network model (default=2048).
  • dropout – the dropout value (default=0.1).
  • activation – the activation function of encoder/decoder intermediate layer, can be a string (“relu” or “gelu”) or a unary callable. Default: relu
  • custom_encoder – custom encoder (default=None).
  • custom_decoder – custom decoder (default=None).
  • layer_norm_eps – the eps value in layer normalization components (default=1e-5).
  • batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False (seq, batch, feature).
  • norm_first – if True, encoder and decoder layers will perform LayerNorms before other attention and feedforward operations, otherwise after. Default: False (after).
forward(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)

Parameters

  • src – the sequence to the encoder (required).
  • tgt – the sequence to the decoder (required).
  • src_mask – the additive mask for the src sequence (optional).
  • tgt_mask – the additive mask for the tgt sequence (optional).
  • memory_mask – the additive mask for the encoder output (optional).
  • src_key_padding_mask – the ByteTensor mask for src keys per batch (optional).
  • tgt_key_padding_mask – the ByteTensor mask for tgt keys per batch (optional).
  • memory_key_padding_mask – the ByteTensor mask for memory keys per batch (optional).

Shape

  • src: (S,E) for unbatched input, (S,N,E) if batch_first=False or (N,S,E) if batch_first=True.
  • tgt:(T,E) for unbatched input, (T,N,E) if batch_first=False or (N,T,E) if batch_first=True.
  • src_mask: (S,S) or (N⋅num_heads,S,S).
  • tgt_mask: (T,T) or (N⋅num_heads,T,T).
  • memory_mask: (T,S).
  • src_key_padding_mask: (S) for unbatched input otherwise (N,S).
  • tgt_key_padding_mask: (T) for unbatched input otherwise (N,T).
  • memory_key_padding_mask:(S) for unbatched input otherwise (N,S).
  • output: (T,E) for unbatched input, (T,N,E) if batch_first=False or (N,T,E) if batch_first=True.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值