引言
在很多经典网络结构中都有nn.Parameter()
这个函数,故对其进行了解
上述翻译:
参数是张量的子类,在与模块一起使用时具有非常特殊的属性-当它们被分配为模块属性时,它们会自动添加到参数列表中,并且会出现在Parameters()迭代器中。赋值张量没有这样的效果。这是因为人们可能想要在模型中缓存一些临时状态,比如RNN的最后一个隐藏状态。如果没有Parameter这样的类,这些临时也会被注册
语法结构:
torch.nn.parameter.Parameter(data=None, requires_grad=True)
"""
1、data (Tensor) – parameter tensor. —— 输入得是一个tensor。data为传入Tensor类型参数
2、requires_grad (bool, optional) – if the parameter requires gradient. See Locally disabling gradient computation for more details。Default: True —— 这个不用解释,需要注意的是nn.Parameter()默认有梯度。requires_grad默认值为True,表示可训练,False表示不可训练
"""
作用: torch.nn.Parameter继承torch.Tensor,其作用将一个不可训练的类型为Tensor的参数转化为可训练的类型为parameter的参数,并将这个参数绑定到module里面,成为module中可训练的参数。
—————————————————————————————————————————————————
其他人的解释:
torch.nn.Parameter()将一个不可训练的tensor转换成可以训练的类型parameter,并将这个parameter绑定到这个module里面。即在定义网络时这个tensor就是一个可以训练的参数了。使用这个函数的目的也是想让某些变量在学习的过程中不断的修改其值以达到最优化。
以nn.Linear为例:
class Linear(Module):
r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b`
Args:
in_features: size of each input sample
out_features: size of each output sample
bias: If set to ``False``, the layer will not learn an additive bias.
Default: ``True``
Shape:
- Input: :math:`(N, *, H_{in})` where :math:`*` means any number of
additional dimensions and :math:`H_{in} = \text{in\_features}`
- Output: :math:`(N, *, H_{out})` where all but the last dimension
are the same shape as the input and :math:`H_{out} = \text{out\_features}`.
Attributes:
weight: the learnable weights of the module of shape
:math:`(\text{out\_features}, \text{in\_features})`. The values are
initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
:math:`k = \frac{1}{\text{in\_features}}`
bias: the learnable bias of the module of shape :math:`(\text{out\_features})`.
If :attr:`bias` is ``True``, the values are initialized from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{1}{\text{in\_features}}`
Examples::
>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])
"""
__constants__ = ['in_features', 'out_features']
def __init__(self, in_features, out_features, bias=True): # 在__init__(self, in_features, out_features, bias=True)中初始化两个参数:self.weight和self.bias。
super(Linear, self).__init__()
self.in_features = in_features
self.out_features = out_features
# self.weight = Parameter(torch.Tensor(out_features, in_features))定义一个形状为(out_features, in_features)可训练参数。
self.weight = Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = Parameter(torch.Tensor(out_features)) # self.bias同理。
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
def forward(self, input):
return F.linear(input, self.weight, self.bias)
def extra_repr(self):
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
其他例子:SGE举例
import numpy as np
import torch
from torch import nn # 引入torch.nn as nn
from torch.nn import init
class SpatialGroupEnhance(nn.Module):
def __init__(self, groups):
super().__init__()
self.groups=groups
self.avg_pool = nn.AdaptiveAvgPool2d(1)
# 使用torch.nn.Parameter将不可训练的tensor转换为可训练的tensor并在该类中进行注册
"""
可以看到,torch.zeros(1,groups,1,1)它是一个没有梯度的tensor,所以不能参与训练,而
self.weight=nn.Parameter(torch.zeros(1,groups,1,1))之后,self.weight就是一个有梯度的tensor,可以参与forward并进行反向传播不断学习
"""
self.weight=nn.Parameter(torch.zeros(1,groups,1,1)) # [1, G, 1, 1]
self.bias=nn.Parameter(torch.zeros(1,groups,1,1)) # [1, G, 1, 1]
self.sig=nn.Sigmoid()
self.init_weights()
def forward(self, x):
b, c, h, w = x.shape # [BS, C, H, W]
x = x.view(b * self.groups, -1, h, w) # [BS, C, H, W] -> [BS*G, C//G, H, W]
xn = x * self.avg_pool(x) # [BS*G, C//G, H, W] * [BS*G, 1] = [BS*G, C//G, H, W]
xn = xn.sum(dim=1, keepdim=True) # [BS*G, C//G, H, W] -> [BS*G, 1, H, W]
t = xn.view(b * self.groups, -1) # [BS*G, 1, H, W] -> [BS*G, H*W]
t = t - t.mean(dim=1, keepdim=True) # [BS*G, H*W] - [BS*G, 1] = [BS*G, H*W]
std = t.std(dim=1, keepdim=True) + 1e-5 # [BS*G, 1]
t = t / std # [BS*G, H*W] / [BS*G, 1] = [BS*G, H*W]
t = t.view(b, self.groups, h, w) # [BS*G, H*W] -> [BS, G, H, W]
"""
self.weight和self.bias是经过nn.Parameter()注册过后的tensor,是可以学习的参数
"""
t = t * self.weight + self.bias # [BS, G, H, W] * [1, G, 1, 1] + [1, G, 1, 1] = [BS, G, H, W]
t = t.view(b * self.groups, 1, h, w) # [BS, G, H, W] -> [BS*G, 1, H, W]
x = x * self.sig(t) # [BS*G, 1, H, W] -> [BS*G, 1, H, W]
x = x.view(b, c, h, w) # [BS*G, 1, H, W] -> [BS, C, H, W]
return x