【模型学习】权重文件的学习记录

读取预训练权重时遇上了权重名称的问题,给自己留下一点东西以便后面回头复习用

模型文件权重命名规则

首先感谢P导的Swin-Transformer网络结构详解
跟着P导学习SWTR后我在自己学习的项目中替换了P导的swtr进行训练,在finetune环节中出现了权重值不匹配的情况,打印后P导的各层权重名称为:

patch_embed.proj.weight
patch_embed.proj.bias
patch_embed.norm.weight
patch_embed.norm.bias
layers.0.blocks.0.norm1.weight
layers.0.blocks.0.norm1.bias
layers.0.blocks.0.attn.relative_position_bias_table
layers.0.blocks.0.attn.relative_position_index
layers.0.blocks.0.attn.qkv.weight
layers.0.blocks.0.attn.qkv.bias
layers.0.blocks.0.attn.proj.weight
layers.0.blocks.0.attn.proj.bias
layers.0.blocks.0.norm2.weight
layers.0.blocks.0.norm2.bias
layers.0.blocks.0.mlp.fc1.weight
layers.0.blocks.0.mlp.fc1.bias
layers.0.blocks.0.mlp.fc2.weight
layers.0.blocks.0.mlp.fc2.bias
layers.0.blocks.1.norm1.weight
layers.0.blocks.1.norm1.bias
layers.0.blocks.1.attn.relative_position_bias_table
layers.0.blocks.1.attn.relative_position_index
layers.0.blocks.1.attn.qkv.weight
layers.0.blocks.1.attn.qkv.bias
layers.0.blocks.1.attn.proj.weight
layers.0.blocks.1.attn.proj.bias
layers.0.blocks.1.norm2.weight
layers.0.blocks.1.norm2.bias
layers.0.blocks.1.mlp.fc1.weight
layers.0.blocks.1.mlp.fc1.bias
layers.0.blocks.1.mlp.fc2.weight
layers.0.blocks.1.mlp.fc2.bias
layers.0.downsample.reduction.weight
layers.0.downsample.norm.weight
layers.0.downsample.norm.bias
layers.1.blocks.0.norm1.weight
layers.1.blocks.0.norm1.bias
layers.1.blocks.0.attn.relative_position_bias_table
layers.1.blocks.0.attn.relative_position_index
layers.1.blocks.0.attn.qkv.weight
layers.1.blocks.0.attn.qkv.bias
layers.1.blocks.0.attn.proj.weight
layers.1.blocks.0.attn.proj.bias
layers.1.blocks.0.norm2.weight
layers.1.blocks.0.norm2.bias
layers.1.blocks.0.mlp.fc1.weight
layers.1.blocks.0.mlp.fc1.bias
layers.1.blocks.0.mlp.fc2.weight
layers.1.blocks.0.mlp.fc2.bias
layers.1.blocks.1.norm1.weight
layers.1.blocks.1.norm1.bias
layers.1.blocks.1.attn.relative_position_bias_table
layers.1.blocks.1.attn.relative_position_index
layers.1.blocks.1.attn.qkv.weight
layers.1.blocks.1.attn.qkv.bias
layers.1.blocks.1.attn.proj.weight
layers.1.blocks.1.attn.proj.bias
layers.1.blocks.1.norm2.weight
layers.1.blocks.1.norm2.bias
layers.1.blocks.1.mlp.fc1.weight
layers.1.blocks.1.mlp.fc1.bias
layers.1.blocks.1.mlp.fc2.weight
layers.1.blocks.1.mlp.fc2.bias
layers.1.downsample.reduction.weight
layers.1.downsample.norm.weight
layers.1.downsample.norm.bias
layers.2.blocks.0.norm1.weight
layers.2.blocks.0.norm1.bias
layers.2.blocks.0.attn.relative_position_bias_table
layers.2.blocks.0.attn.relative_position_index
layers.2.blocks.0.attn.qkv.weight
layers.2.blocks.0.attn.qkv.bias
layers.2.blocks.0.attn.proj.weight
layers.2.blocks.0.attn.proj.bias
layers.2.blocks.0.norm2.weight
layers.2.blocks.0.norm2.bias
layers.2.blocks.0.mlp.fc1.weight
layers.2.blocks.0.mlp.fc1.bias
layers.2.blocks.0.mlp.fc2.weight
layers.2.blocks.0.mlp.fc2.bias
layers.2.blocks.1.norm1.weight
layers.2.blocks.1.norm1.bias
layers.2.blocks.1.attn.relative_position_bias_table
layers.2.blocks.1.attn.relative_position_index
layers.2.blocks.1.attn.qkv.weight
layers.2.blocks.1.attn.qkv.bias
layers.2.blocks.1.attn.proj.weight
layers.2.blocks.1.attn.proj.bias
layers.2.blocks.1.norm2.weight
layers.2.blocks.1.norm2.bias
layers.2.blocks.1.mlp.fc1.weight
layers.2.blocks.1.mlp.fc1.bias
layers.2.blocks.1.mlp.fc2.weight
layers.2.blocks.1.mlp.fc2.bias
layers.2.blocks.2.norm1.weight
layers.2.blocks.2.norm1.bias
layers.2.blocks.2.attn.relative_position_bias_table
layers.2.blocks.2.attn.relative_position_index
layers.2.blocks.2.attn.qkv.weight
layers.2.blocks.2.attn.qkv.bias
layers.2.blocks.2.attn.proj.weight
layers.2.blocks.2.attn.proj.bias
layers.2.blocks.2.norm2.weight
layers.2.blocks.2.norm2.bias
layers.2.blocks.2.mlp.fc1.weight
layers.2.blocks.2.mlp.fc1.bias
layers.2.blocks.2.mlp.fc2.weight
layers.2.blocks.2.mlp.fc2.bias
layers.2.blocks.3.norm1.weight
layers.2.blocks.3.norm1.bias
layers.2.blocks.3.attn.relative_position_bias_table
layers.2.blocks.3.attn.relative_position_index
layers.2.blocks.3.attn.qkv.weight
layers.2.blocks.3.attn.qkv.bias
layers.2.blocks.3.attn.proj.weight
layers.2.blocks.3.attn.proj.bias
layers.2.blocks.3.norm2.weight
layers.2.blocks.3.norm2.bias
layers.2.blocks.3.mlp.fc1.weight
layers.2.blocks.3.mlp.fc1.bias
layers.2.blocks.3.mlp.fc2.weight
layers.2.blocks.3.mlp.fc2.bias
layers.2.blocks.4.norm1.weight
layers.2.blocks.4.norm1.bias
layers.2.blocks.4.attn.relative_position_bias_table
layers.2.blocks.4.attn.relative_position_index
layers.2.blocks.4.attn.qkv.weight
layers.2.blocks.4.attn.qkv.bias
layers.2.blocks.4.attn.proj.weight
layers.2.blocks.4.attn.proj.bias
layers.2.blocks.4.norm2.weight
layers.2.blocks.4.norm2.bias
layers.2.blocks.4.mlp.fc1.weight
layers.2.blocks.4.mlp.fc1.bias
layers.2.blocks.4.mlp.fc2.weight
layers.2.blocks.4.mlp.fc2.bias
layers.2.blocks.5.norm1.weight
layers.2.blocks.5.norm1.bias
layers.2.blocks.5.attn.relative_position_bias_table
layers.2.blocks.5.attn.relative_position_index
layers.2.blocks.5.attn.qkv.weight
layers.2.blocks.5.attn.qkv.bias
layers.2.blocks.5.attn.proj.weight
layers.2.blocks.5.attn.proj.bias
layers.2.blocks.5.norm2.weight
layers.2.blocks.5.norm2.bias
layers.2.blocks.5.mlp.fc1.weight
layers.2.blocks.5.mlp.fc1.bias
layers.2.blocks.5.mlp.fc2.weight
layers.2.blocks.5.mlp.fc2.bias
layers.2.downsample.reduction.weight
layers.2.downsample.norm.weight
layers.2.downsample.norm.bias
layers.3.blocks.0.norm1.weight
layers.3.blocks.0.norm1.bias
layers.3.blocks.0.attn.relative_position_bias_table
layers.3.blocks.0.attn.relative_position_index
layers.3.blocks.0.attn.qkv.weight
layers.3.blocks.0.attn.qkv.bias
layers.3.blocks.0.attn.proj.weight
layers.3.blocks.0.attn.proj.bias
layers.3.blocks.0.norm2.weight
layers.3.blocks.0.norm2.bias
layers.3.blocks.0.mlp.fc1.weight
layers.3.blocks.0.mlp.fc1.bias
layers.3.blocks.0.mlp.fc2.weight
layers.3.blocks.0.mlp.fc2.bias
layers.3.blocks.1.norm1.weight
layers.3.blocks.1.norm1.bias
layers.3.blocks.1.attn.relative_position_bias_table
layers.3.blocks.1.attn.relative_position_index
layers.3.blocks.1.attn.qkv.weight
layers.3.blocks.1.attn.qkv.bias
layers.3.blocks.1.attn.proj.weight
layers.3.blocks.1.attn.proj.bias
layers.3.blocks.1.norm2.weight
layers.3.blocks.1.norm2.bias
layers.3.blocks.1.mlp.fc1.weight
layers.3.blocks.1.mlp.fc1.bias
layers.3.blocks.1.mlp.fc2.weight
layers.3.blocks.1.mlp.fc2.bias
norm.weight
norm.bias
head.weight
head.bias

而项目自带的模型训练完各层的名称结构为:

stage1.patch_partition.linear.weight
stage1.patch_partition.linear.bias
stage1.layers.0.0.attention_block.fn.norm.weight
stage1.layers.0.0.attention_block.fn.norm.bias
stage1.layers.0.0.attention_block.fn.fn.pos_embedding
stage1.layers.0.0.attention_block.fn.fn.to_qkv.weight
stage1.layers.0.0.attention_block.fn.fn.to_out.weight
stage1.layers.0.0.attention_block.fn.fn.to_out.bias
stage1.layers.0.0.mlp_block.fn.norm.weight
stage1.layers.0.0.mlp_block.fn.norm.bias
stage1.layers.0.0.mlp_block.fn.fn.net.0.weight
stage1.layers.0.0.mlp_block.fn.fn.net.0.bias
stage1.layers.0.0.mlp_block.fn.fn.net.2.weight
stage1.layers.0.0.mlp_block.fn.fn.net.2.bias
stage1.layers.0.1.attention_block.fn.norm.weight
stage1.layers.0.1.attention_block.fn.norm.bias
stage1.layers.0.1.attention_block.fn.fn.upper_lower_mask
stage1.layers.0.1.attention_block.fn.fn.left_right_mask
stage1.layers.0.1.attention_block.fn.fn.pos_embedding
stage1.layers.0.1.attention_block.fn.fn.to_qkv.weight
stage1.layers.0.1.attention_block.fn.fn.to_out.weight
stage1.layers.0.1.attention_block.fn.fn.to_out.bias
stage1.layers.0.1.mlp_block.fn.norm.weight
stage1.layers.0.1.mlp_block.fn.norm.bias
stage1.layers.0.1.mlp_block.fn.fn.net.0.weight
stage1.layers.0.1.mlp_block.fn.fn.net.0.bias
stage1.layers.0.1.mlp_block.fn.fn.net.2.weight
stage1.layers.0.1.mlp_block.fn.fn.net.2.bias
stage2.patch_partition.linear.weight
stage2.patch_partition.linear.bias
stage2.layers.0.0.attention_block.fn.norm.weight
stage2.layers.0.0.attention_block.fn.norm.bias
stage2.layers.0.0.attention_block.fn.fn.pos_embedding
stage2.layers.0.0.attention_block.fn.fn.to_qkv.weight
stage2.layers.0.0.attention_block.fn.fn.to_out.weight
stage2.layers.0.0.attention_block.fn.fn.to_out.bias
stage2.layers.0.0.mlp_block.fn.norm.weight
stage2.layers.0.0.mlp_block.fn.norm.bias
stage2.layers.0.0.mlp_block.fn.fn.net.0.weight
stage2.layers.0.0.mlp_block.fn.fn.net.0.bias
stage2.layers.0.0.mlp_block.fn.fn.net.2.weight
stage2.layers.0.0.mlp_block.fn.fn.net.2.bias
stage2.layers.0.1.attention_block.fn.norm.weight
stage2.layers.0.1.attention_block.fn.norm.bias
stage2.layers.0.1.attention_block.fn.fn.upper_lower_mask
stage2.layers.0.1.attention_block.fn.fn.left_right_mask
stage2.layers.0.1.attention_block.fn.fn.pos_embedding
stage2.layers.0.1.attention_block.fn.fn.to_qkv.weight
stage2.layers.0.1.attention_block.fn.fn.to_out.weight
stage2.layers.0.1.attention_block.fn.fn.to_out.bias
stage2.layers.0.1.mlp_block.fn.norm.weight
stage2.layers.0.1.mlp_block.fn.norm.bias
stage2.layers.0.1.mlp_block.fn.fn.net.0.weight
stage2.layers.0.1.mlp_block.fn.fn.net.0.bias
stage2.layers.0.1.mlp_block.fn.fn.net.2.weight
stage2.layers.0.1.mlp_block.fn.fn.net.2.bias
stage3.patch_partition.linear.weight
stage3.patch_partition.linear.bias
stage3.layers.0.0.attention_block.fn.norm.weight
stage3.layers.0.0.attention_block.fn.norm.bias
stage3.layers.0.0.attention_block.fn.fn.pos_embedding
stage3.layers.0.0.attention_block.fn.fn.to_qkv.weight
stage3.layers.0.0.attention_block.fn.fn.to_out.weight
stage3.layers.0.0.attention_block.fn.fn.to_out.bias
stage3.layers.0.0.mlp_block.fn.norm.weight
stage3.layers.0.0.mlp_block.fn.norm.bias
stage3.layers.0.0.mlp_block.fn.fn.net.0.weight
stage3.layers.0.0.mlp_block.fn.fn.net.0.bias
stage3.layers.0.0.mlp_block.fn.fn.net.2.weight
stage3.layers.0.0.mlp_block.fn.fn.net.2.bias
stage3.layers.0.1.attention_block.fn.norm.weight
stage3.layers.0.1.attention_block.fn.norm.bias
stage3.layers.0.1.attention_block.fn.fn.upper_lower_mask
stage3.layers.0.1.attention_block.fn.fn.left_right_mask
stage3.layers.0.1.attention_block.fn.fn.pos_embedding
stage3.layers.0.1.attention_block.fn.fn.to_qkv.weight
stage3.layers.0.1.attention_block.fn.fn.to_out.weight
stage3.layers.0.1.attention_block.fn.fn.to_out.bias
stage3.layers.0.1.mlp_block.fn.norm.weight
stage3.layers.0.1.mlp_block.fn.norm.bias
stage3.layers.0.1.mlp_block.fn.fn.net.0.weight
stage3.layers.0.1.mlp_block.fn.fn.net.0.bias
stage3.layers.0.1.mlp_block.fn.fn.net.2.weight
stage3.layers.0.1.mlp_block.fn.fn.net.2.bias
stage3.layers.1.0.attention_block.fn.norm.weight
stage3.layers.1.0.attention_block.fn.norm.bias
stage3.layers.1.0.attention_block.fn.fn.pos_embedding
stage3.layers.1.0.attention_block.fn.fn.to_qkv.weight
stage3.layers.1.0.attention_block.fn.fn.to_out.weight
stage3.layers.1.0.attention_block.fn.fn.to_out.bias
stage3.layers.1.0.mlp_block.fn.norm.weight
stage3.layers.1.0.mlp_block.fn.norm.bias
stage3.layers.1.0.mlp_block.fn.fn.net.0.weight
stage3.layers.1.0.mlp_block.fn.fn.net.0.bias
stage3.layers.1.0.mlp_block.fn.fn.net.2.weight
stage3.layers.1.0.mlp_block.fn.fn.net.2.bias
stage3.layers.1.1.attention_block.fn.norm.weight
stage3.layers.1.1.attention_block.fn.norm.bias
stage3.layers.1.1.attention_block.fn.fn.upper_lower_mask
stage3.layers.1.1.attention_block.fn.fn.left_right_mask
stage3.layers.1.1.attention_block.fn.fn.pos_embedding
stage3.layers.1.1.attention_block.fn.fn.to_qkv.weight
stage3.layers.1.1.attention_block.fn.fn.to_out.weight
stage3.layers.1.1.attention_block.fn.fn.to_out.bias
stage3.layers.1.1.mlp_block.fn.norm.weight
stage3.layers.1.1.mlp_block.fn.norm.bias
stage3.layers.1.1.mlp_block.fn.fn.net.0.weight
stage3.layers.1.1.mlp_block.fn.fn.net.0.bias
stage3.layers.1.1.mlp_block.fn.fn.net.2.weight
stage3.layers.1.1.mlp_block.fn.fn.net.2.bias
stage3.layers.2.0.attention_block.fn.norm.weight
stage3.layers.2.0.attention_block.fn.norm.bias
stage3.layers.2.0.attention_block.fn.fn.pos_embedding
stage3.layers.2.0.attention_block.fn.fn.to_qkv.weight
stage3.layers.2.0.attention_block.fn.fn.to_out.weight
stage3.layers.2.0.attention_block.fn.fn.to_out.bias
stage3.layers.2.0.mlp_block.fn.norm.weight
stage3.layers.2.0.mlp_block.fn.norm.bias
stage3.layers.2.0.mlp_block.fn.fn.net.0.weight
stage3.layers.2.0.mlp_block.fn.fn.net.0.bias
stage3.layers.2.0.mlp_block.fn.fn.net.2.weight
stage3.layers.2.0.mlp_block.fn.fn.net.2.bias
stage3.layers.2.1.attention_block.fn.norm.weight
stage3.layers.2.1.attention_block.fn.norm.bias
stage3.layers.2.1.attention_block.fn.fn.upper_lower_mask
stage3.layers.2.1.attention_block.fn.fn.left_right_mask
stage3.layers.2.1.attention_block.fn.fn.pos_embedding
stage3.layers.2.1.attention_block.fn.fn.to_qkv.weight
stage3.layers.2.1.attention_block.fn.fn.to_out.weight
stage3.layers.2.1.attention_block.fn.fn.to_out.bias
stage3.layers.2.1.mlp_block.fn.norm.weight
stage3.layers.2.1.mlp_block.fn.norm.bias
stage3.layers.2.1.mlp_block.fn.fn.net.0.weight
stage3.layers.2.1.mlp_block.fn.fn.net.0.bias
stage3.layers.2.1.mlp_block.fn.fn.net.2.weight
stage3.layers.2.1.mlp_block.fn.fn.net.2.bias
stage4.patch_partition.linear.weight
stage4.patch_partition.linear.bias
stage4.layers.0.0.attention_block.fn.norm.weight
stage4.layers.0.0.attention_block.fn.norm.bias
stage4.layers.0.0.attention_block.fn.fn.pos_embedding
stage4.layers.0.0.attention_block.fn.fn.to_qkv.weight
stage4.layers.0.0.attention_block.fn.fn.to_out.weight
stage4.layers.0.0.attention_block.fn.fn.to_out.bias
stage4.layers.0.0.mlp_block.fn.norm.weight
stage4.layers.0.0.mlp_block.fn.norm.bias
stage4.layers.0.0.mlp_block.fn.fn.net.0.weight
stage4.layers.0.0.mlp_block.fn.fn.net.0.bias
stage4.layers.0.0.mlp_block.fn.fn.net.2.weight
stage4.layers.0.0.mlp_block.fn.fn.net.2.bias
stage4.layers.0.1.attention_block.fn.norm.weight
stage4.layers.0.1.attention_block.fn.norm.bias
stage4.layers.0.1.attention_block.fn.fn.upper_lower_mask
stage4.layers.0.1.attention_block.fn.fn.left_right_mask
stage4.layers.0.1.attention_block.fn.fn.pos_embedding
stage4.layers.0.1.attention_block.fn.fn.to_qkv.weight
stage4.layers.0.1.attention_block.fn.fn.to_out.weight
stage4.layers.0.1.attention_block.fn.fn.to_out.bias
stage4.layers.0.1.mlp_block.fn.norm.weight
stage4.layers.0.1.mlp_block.fn.norm.bias
stage4.layers.0.1.mlp_block.fn.fn.net.0.weight
stage4.layers.0.1.mlp_block.fn.fn.net.0.bias
stage4.layers.0.1.mlp_block.fn.fn.net.2.weight
stage4.layers.0.1.mlp_block.fn.fn.net.2.bias
mlp_head.0.weight
mlp_head.0.bias
mlp_head.1.weight
mlp_head.1.bias

在阅读完这篇文章pytorch中存储各层权重参数时的命名规则,为什么有些层的名字中带module.后我才了解这是为何。
在模型中,以下面代码块所示为例:

def __init__(self, dim, num_heads, window_size=7, shift_size=0,
             mlp_ratio=4., qkv_bias=True, drop=0., attn_drop=0., drop_path=0.,
             act_layer=nn.GELU, norm_layer=nn.LayerNorm):
    super().__init__()
    self.dim = dim
    self.num_heads = num_heads
    self.window_size = window_size
    self.shift_size = shift_size
    self.mlp_ratio = mlp_ratio
    assert 0 <= self.shift_size < self.window_size, "shift_size must in 0-window_size"
    self.norm1 = norm_layer(dim)  # 对应encoder-block第一个layer_norm,block中此处要计算,所以block权重中有norm1(层中第一个Norm)
    self.attn = WindowAttention(  # w-msa或sw-msa,结构与之前的encoder-block中的multi-head类似 # block中此处要计算,所以block权重中有attn
        dim, window_size=(self.window_size, self.window_size), num_heads=num_heads, qkv_bias=qkv_bias,
        attn_drop=attn_drop, proj_drop=drop)
    self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
    self.norm2 = norm_layer(dim)  # 此处要计算,所以有norm2
    mlp_hidden_dim = int(dim * mlp_ratio)  #
    self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)  # 得到mlp相关的权重,保存

__init__中的几个参数,若是直接得到的,如self.dim=dim,则没有保存权重参数,而如self.norm1=norm_layer(dim)与self.attn=WindowAttention(…)这两个则是要通过另一个板块的计算才能得到的参数,此时就会保存下他的结构,根据上面的参考文章内的讲解,此时保存的权重是使用self定义的变量的变量名作为存储时的名字,也就是

layers.0.blocks.0.norm1.weight
layers.0.blocks.0.norm1.bias

layers.0.blocks.0.attn.relative_position_bias_table
layers.0.blocks.0.attn.relative_position_index
layers.0.blocks.0.attn.qkv.weight
layers.0.blocks.0.attn.qkv.bias
layers.0.blocks.0.attn.proj.weight
layers.0.blocks.0.attn.proj.bias

如有说的不对的地方,望各位大佬多多指正

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值