DETR代码

最新推荐文章于 2024-03-21 01:36:45 发布

小王五年毕业

最新推荐文章于 2024-03-21 01:36:45 发布

阅读量1k

点赞数 9

文章标签： python 人工智能深度学习

本文链接：https://blog.csdn.net/qq_43805437/article/details/125905602

版权

DETR代码

position_encoding 模块

import math
import torch
from torch import nn

from util.misc import NestedTensor

1、 NestedTensor ，包括 tensor 和 mask 两个成员，tensor 就是输入的图像。 mask 跟 tensor 同高宽但是单通道。tensor 的维度是(batch_size, channel, h, w) , mask 的维度是 (batch_size, h, w) 。

class NestedTensor(object):
    def __init__(self, tensors, mask: Optional[Tensor]):
        self.tensors = tensors
        self.mask = mask

class PositionEmbeddingSine(nn.Module):
    """
    This is a more standard version of the position embedding, very similar to the one
    used by the Attention is all you need paper, generalized to work on images.
    这是一个更标准的位置编码的版本，非常类似于 Attention is all you need 论文，泛化为处理图像。
    """
    def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):
        super().__init__()
        self.num_pos_feats = num_pos_feats
        self.temperature = temperature
        self.normalize = normalize
        if scale is not None and normalize is False:
            raise ValueError("normalize should be True if scale is passed")
        if scale is None:
            scale = 2 * math.pi
        self.scale = scale

    def forward(self, tensor_list: NestedTensor):
        x = tensor_list.tensors
        mask = tensor_list.mask
        assert mask is not None
        not_mask = ~mask
        y_embed = not_mask.cumsum(1, dtype=torch.float32)
        x_embed = not_mask.cumsum(2, dtype=torch.float32)
        if self.normalize:
            eps = 1e-6 # 0.000001
            y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale
            x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale

        dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
        dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)

        pos_x = x_embed[:, :, :, None] / dim_t  # None的操作是为了增加一个维度
        pos_y = y_embed[:, :, :, None] / dim_t
        pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)
        pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)
        pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
        return pos

1、raise ValueError("normalize should be True if scale is passed")
编写代码时能否手动抛出一个异常吗？答案是肯定的，Python 允许程序自行引发异常，使用 raise 语句即可。基本语法格式为： raise [exceptionName [(reason)]] ， ValueError 表示 Inappropriate argument value (of correct type) 不适当的参数值（正确类型）。

2、assert mask is not None
中 assert 为断言的意思，断定 mask 一定不为 None。

3、not_mask = ~mask
~ 按位取反运算符：对数据的每个二进制位取反,即把 1 变为 0 ，把 0 变为 1 。~x 类似于 -x-1。

4、y_embed = not_mask.cumsum(1, dtype=torch.float32) 在行上面进行累加。
x_embed = not_mask.cumsum(2, dtype=torch.float32) 在列上面进行累加。
cumsum 表示累加的用法，其它用法见链接。

5、if self.normalize : 如果做了正则化操作，则进行：

matlab 中 ./ 和 .* 代表矩阵对应元素相除和相乘，俗称点除和点乘。

python 中，/ 和 * 代表点除和点乘，与 matlab 不相同。

python 中需要使用 numpy 的 dot() 函数进行交叉乘，且不像 matlab 有交叉除。

-1 表示反向取数，在这里也就是取最后一个数，y_embed[:, -1:, :] 取每一个batch的最后一列全部元素组成新的矩阵 (batch,1,width)。

代码	含义	初始维度	变换后维度
`a[:,-1,:]`	取第 $2$ 维的最后一组数据	`[a,b,c]`	`[a,1,c]`
`a[:,-1:,:]`	取第 $2$ 维的最后一组数据（与上面相同）	`[a,b,c]`	`[a,1,c]`
`a[:,:-1,:]`	取第 $2$ 维的除了最后一组以外的数据	`[a,b,c]`	`[a,b-1,c]`
`a[:,::-1,:]`	在第 $2$ 维上反向取数	`[a,b,c]`	`[a,b,c]`
`a[:,2:-1,:]`	取第 $2$ 维的第 $3$ 个数到第 $b - 1$ 个数	`[a,b,c]`	`[a,b-3,c]`
`a[:,2::-1,:]`	取第 $2$ 维的第 $3$ 个数到第 $1$ 个数，翻转读取	`[a,b,c]`	`[a,2+1,c]`

6、dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) 返回一个一维向量，其大小为 $(e n d - s t a r t) / s t e p$ ，取值区间为 [start, end) ，从 start 开始，以 step 为步长增加，直到 end 结束（不包括 end ）。文章中就是 $\to 63$ 。

7、dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
** 表示幂运算， // 表示除法（但是向下取整）。

8、dim_t = x_embed[:, :, :, None] / dim_t 这里从维度 [batch, height, width] 变成了 [batch, height, width, num_pos_feats] ，如果不加 None 则会报错如下：ValueError: operands could not be broadcast together with shapes (3,3,3) (4,) 。

9、pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)
torch.stack ：stack 堆，顾名思义，沿着一个新维度进行堆叠拼接 outputs = torch.stack(inputs, dim=?) ，inputs : 待连接的张量序列。dim : 新的维度，必须在 $0$ 到 len(outputs) 之间。有公式 len(outputs)=len(inputs)+1 。
torch.sin() 会将输入值作为弧度而不是角度计算 sin 值，cos() 类似。
0::2 双冒号表示从 $0$ 开始步长为 $2$ 取值到最后，使用这个是为了将奇数行列用 cos 编码，偶数行列用 sin 编码。
在进行完 stack 操作后，维度变为 [batch, height, width, num_pos_feats, 2] ，从第三维开始展平（ flatten ），展平后维度变为 [batch, height, width, num_pos_feats*2] 。

10、pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
torch.cat 操作为在一个维度上增加深度，简单说，不增加维度，增加深度，而 torch.stack 则是在一个的维度上将两者相加，简单说就是插入一个新的维度。维度变为 [batch, num_pos_feats*2,height, width] 。

class PositionEmbeddingLearned(nn.Module):
    """
    Absolute pos embedding, learned.
    """
    def __init__(self, num_pos_feats=256):
        super().__init__()
        self.row_embed = nn.Embedding(50, num_pos_feats)
        self.col_embed = nn.Embedding(50, num_pos_feats)
        self.reset_parameters()

    def reset_parameters(self):
        nn.init.uniform_(self.row_embed.weight)
        nn.init.uniform_(self.col_embed.weight)

    def forward(self, tensor_list: NestedTensor):
        x = tensor_list.tensors
        h, w = x.shape[-2:] # 高度和宽度
        i = torch.arange(w, device=x.device) # i:(w)
        j = torch.arange(h, device=x.device) # j:(h)
        x_emb = self.col_embed(i) # x:(w,256)
        y_emb = self.row_embed(j) # y:(h,256)
        pos = torch.cat([
            x_emb.unsqueeze(0).repeat(h, 1, 1), # (h,w,256)
            y_emb.unsqueeze(1).repeat(1, w, 1), # (h,w,256)
        ], dim=-1).permute(2, 0, 1).unsqueeze(0).repeat(x.shape[0], 1, 1, 1) #  (h,w,512) (512,h,w) (batch_size,512,h,w)
        return pos

1、nn.Embedding
torch.nn.Embedding(num_embeddings, embedding_dim)
num_embeddings (python:int) —— 词典的大小尺寸，比如总共出现 5000 个词，那就输入 5000 。此时index 为（ $0 - 4999$ ）
embedding_dim (python:int)–—— 嵌入向量的维度，即用多少维来表示一个符号。
输入必须是 LongTensor，FloatTensor 需通过 tensor.long() 方法转成 LongTensor。
经过实验 num_embeddings.weight 均为随机产生的。输出结果含有 requires_grad ，说明此步骤在反向传播中需要计算。

2、nn.init.uniform_()

分布	代码	服从
均匀分布	`torch.nn.init.uniform_(tensor, a=0, b=1)`	$\sim U(a,b)$
正态分布	`torch.nn.init.normal_(tensor, mean=0, std=1)`	$\sim N(mean,std)$
初始化为常数	`torch.nn.init.constant_(tensor, val)`	$v a l$
`Xavier_uniform`	`torch.nn.init.xavier_uniform_(tensor, gain=1)`	$\sim U(-a,a)$
`Xavier_normal`	`torch.nn.init.xavier_normal_(tensor, gain=1)`	$\sim N(0,std)$
`kaiming_uniform`	`torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')`	$\sim N(-bound,bound)$
`kaiming_normal`	`torch.nn.init.kaiming_normal_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)`	$\sim N(0,std)$

关于 Xavier_uniform ，其中 $a$ 的计算公式：
$a=gain\times \sqrt{\frac{6}{fan\_in+fan\_out} }$
关于 Xavier_normal ，其中 $a$ 的计算公式：
$a=gain\times \sqrt{\frac{2}{fan\_in+fan\_out} }$
关于 kaiming_uniform ，其中 $b o u n d$ 的计算公式：
$bound=gain\times \sqrt{\frac{6}{(1+a^{2})\times fan\_in} }$
关于 kaiming_normal ，其中 $s t d$ 的计算公式：
$std=gain\times \sqrt{\frac{2}{(1+a^{2})\times fan\_in} }$

2、 h, w = x.shape[-2:]
从倒数第二行开始，输出维度的大小。

3、unsqueeze 插入一个维度，permute 多次交换维度。 repeat 沿着某个维度重复。补充： squeeze 函数，删除某一个特定的维度，如果维度的大小部位 $1$ ，则会报错。

def build_position_encoding(args):
    N_steps = args.hidden_dim // 2
    if args.position_embedding in ('v2', 'sine'):
        # TODO find a better way of exposing other arguments
        position_embedding = PositionEmbeddingSine(N_steps, normalize=True)
    elif args.position_embedding in ('v3', 'learned'):
        position_embedding = PositionEmbeddingLearned(N_steps)
    else:
        raise ValueError(f"not supported {args.position_embedding}")

    return position_embedding