Pytorch中只导入部分层权重的方法

最新推荐文章于 2024-07-09 17:03:22 发布

汐梦聆海

最新推荐文章于 2024-07-09 17:03:22 发布

阅读量1.1w

点赞数 55

分类专栏： Pytorch python

本文链接：https://blog.csdn.net/jackzhang11/article/details/108047586

版权

python 同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

Pytorch

6 篇文章 0 订阅

订阅专栏

我们通常会用到迁移学习，即在一个比较通用的pretext-task上做预训练，随后针对不同的downstream task进行微调。而在微调的时候，网络结构的最后几层通常是要做出改变的。举个例子，假设pretext-task是在imagenet上面做图像分类，而下游任务是做语义分割，那么在微调的时候需要将分类网络的最后几层全连接层去掉，改造成FCN的网络结构。此时就需要我们把前面层的权重加载进去。

如果改了模型结构以后，再简单粗暴的使用torch.load_state_dict(torch.load(‘xxx.pth’))那么肯定就会报错。所以具体怎么办呢，且耐心往下看。

首先我们定义一个简单的图像分类模型：

class model1(nn.Module):
    def __init__(self, img_size):
        super(model, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, 1, 1)
        self.conv2 = nn.Conv2d(16, 64, 3, 1, 1)
        self.fc1 = nn.Linear(self.num_feature_pixel(img_size), 1024)
        self.fc2 = nn.Linear(1024, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))

        x = torch.flatten(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))

    def num_feature_pixel(self, img_size):
        res = 1
        for i in img_size[2:]:
            res *= i
        res = int(res * 64 / (4**2))
        return res

此时对该模型进行测试，并且将模型参数保存为"pretext.pth"：

img = torch.rand([1, 3, 224, 224])
img_size = img.shape
net = model1(img_size)
res = net(img)
torch.save(net.state_dict(), 'pretext.pth')

此时如果将最后的全连接层都拿掉，再新添加一个conv3，那么网络的结构定义如下：

class model2(nn.Module):
    def __init__(self):
        super(model, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, 1, 1)
        self.conv2 = nn.Conv2d(16, 64, 3, 1, 1)
        self.conv3 = nn.Conv2d(64, 64, 3, 1, 1)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2,2))
        return x

此时如果执行下面的代码，即在新模型的对象中去load之前的"pretext.pth"的参数，就会出现报错：

net = model2()
net = net.load_state_dict(torch.load('pretext.pth'))

"""
RuntimeError: Error(s) in loading state_dict for model:
	Missing key(s) in state_dict: "conv3.weight", "conv3.bias". 
	Unexpected key(s) in state_dict: "fc1.weight", "fc1.bias", "fc2.weight", "fc2.bias".
"""

很显然，在原来模型的参数"pretext.pth"中，并不存在新模型的conv3参数；与此同时，fc1和fc2的相关参数，对于新模型来说也是unexpected的。因此问题就出现在这里：原模型参数的键，不能完全和修改后的模型的key进行匹配。因此要解决这个问题，就是要抽取出"pretext.pth"中存在于新模型中的键值对。

所以下述代码可以完美解决问题：

net = model2()
pretext_model = torch.load('pretext.pth')
model2_dict = net.state_dict()
state_dict = {k:v for k,v in pretext_model.items() if k in model2_dict.keys()}
model2_dict.update(state_dict)
net.load_state_dict(model2_dict)

首先pretext_model是以字典的形式读取出之前模型的参数，model2_dict表示新模型的参数字典，state_dict表示两个模型共有的参数键值对。在得到state_dict以后，model2_dict对共有的key进行更新，即把原模型能读的参数都读进来，最后net加载进这个更新后的参数字典。