mxnet学习笔记(二)——fine-tune前的对pretrained模型修改

Fine-tune过程是使用新的数据集对预训练(pretrained)模型重新训练的过程。既然是“微调”那么一般情况下只需要对最后的全连接层进行训练,因此我们需要对模型进行修改。 

if __name__ == '__main__':
    args = get_args()
    ctx = try_gpu()
    model1 = models.vision.resnet18_v1(pretrained=True,ctx=ctx)
    model, sym, arg_params, aux_params = load_model('resnet-18/resnet-18')
    all_layers = sym.get_internals()
    print(all_layers)
    net = all_layers['flatten0'+'_output']
    net = mx.symbol.FullyConnected(data=net, num_hidden=256, name='fca')
    net = mx.symbol.SoftmaxOutput(data=net, name='softmax')
    net_list = net.get_internals()
    net_list = net_list.list_outputs()
    write_txt('symbol.txt')

使用sym.get_internals()函数来获得模型的计算图,注意其返回值类型为Smybol,可以使用list_outputs()转换为列表输出,print输出结果为:

<Symbol group [data, bn_data_gamma, bn_data_beta, bn_data_moving_mean, bn_data_moving_var, bn_data, conv0_weight, conv0, bn0_gamma, bn0_beta, bn0_moving_mean, bn0_moving_var, bn0, relu0, pooling0, stage1_unit1_bn1_gamma, stage1_unit1_bn1_beta, stage1_unit1_bn1_moving_mean, stage1_unit1_bn1_moving_var, stage1_unit1_bn1, stage1_unit1_relu1, stage1_unit1_conv1_weight, stage1_unit1_conv1, stage1_unit1_bn2_gamma, stage1_unit1_bn2_beta, stage1_unit1_bn2_moving_mean, stage1_unit1_bn2_moving_var, stage1_unit1_bn2, stage1_unit1_relu2, stage1_unit1_conv2_weight, stage1_unit1_conv2, stage1_unit1_sc_weight, stage1_unit1_sc, _plus0, stage1_unit2_bn1_gamma, stage1_unit2_bn1_beta, stage1_unit2_bn1_moving_mean, stage1_unit2_bn1_moving_var, stage1_unit2_bn1, stage1_unit2_relu1, stage1_unit2_conv1_weight, stage1_unit2_conv1, stage1_unit2_bn2_gamma, stage1_unit2_bn2_beta, stage1_unit2_bn2_moving_mean, stage1_unit2_bn2_moving_var, stage1_unit2_bn2, stage1_unit2_relu2, stage1_unit2_conv2_weight, stage1_unit2_conv2, _plus1, stage2_unit1_bn1_gamma, stage2_unit1_bn1_beta, stage2_unit1_bn1_moving_mean, stage2_unit1_bn1_moving_var, stage2_unit1_bn1, stage2_unit1_relu1, stage2_unit1_conv1_weight, stage2_unit1_conv1, stage2_unit1_bn2_gamma, stage2_unit1_bn2_beta, stage2_unit1_bn2_moving_mean, stage2_unit1_bn2_moving_var, stage2_unit1_bn2, stage2_unit1_relu2, stage2_unit1_conv2_weight, stage2_unit1_conv2, stage2_unit1_sc_weight, stage2_unit1_sc, _plus2, stage2_unit2_bn1_gamma, stage2_unit2_bn1_beta, stage2_unit2_bn1_moving_mean, stage2_unit2_bn1_moving_var, stage2_unit2_bn1, stage2_unit2_relu1, stage2_unit2_conv1_weight, stage2_unit2_conv1, stage2_unit2_bn2_gamma, stage2_unit2_bn2_beta, stage2_unit2_bn2_moving_mean, stage2_unit2_bn2_moving_var, stage2_unit2_bn2, stage2_unit2_relu2, stage2_unit2_conv2_weight, stage2_unit2_conv2, _plus3, stage3_unit1_bn1_gamma, stage3_unit1_bn1_beta, stage3_unit1_bn1_moving_mean, stage3_unit1_bn1_moving_var, stage3_unit1_bn1, stage3_unit1_relu1, stage3_unit1_conv1_weight, stage3_unit1_conv1, stage3_unit1_bn2_gamma, stage3_unit1_bn2_beta, stage3_unit1_bn2_moving_mean, stage3_unit1_bn2_moving_var, stage3_unit1_bn2, stage3_unit1_relu2, stage3_unit1_conv2_weight, stage3_unit1_conv2, stage3_unit1_sc_weight, stage3_unit1_sc, _plus4, stage3_unit2_bn1_gamma, stage3_unit2_bn1_beta, stage3_unit2_bn1_moving_mean, stage3_unit2_bn1_moving_var, stage3_unit2_bn1, stage3_unit2_relu1, stage3_unit2_conv1_weight, stage3_unit2_conv1, stage3_unit2_bn2_gamma, stage3_unit2_bn2_beta, stage3_unit2_bn2_moving_mean, stage3_unit2_bn2_moving_var, stage3_unit2_bn2, stage3_unit2_relu2, stage3_unit2_conv2_weight, stage3_unit2_conv2, _plus5, stage4_unit1_bn1_gamma, stage4_unit1_bn1_beta, stage4_unit1_bn1_moving_mean, stage4_unit1_bn1_moving_var, stage4_unit1_bn1, stage4_unit1_relu1, stage4_unit1_conv1_weight, stage4_unit1_conv1, stage4_unit1_bn2_gamma, stage4_unit1_bn2_beta, stage4_unit1_bn2_moving_mean, stage4_unit1_bn2_moving_var, stage4_unit1_bn2, stage4_unit1_relu2, stage4_unit1_conv2_weight, stage4_unit1_conv2, stage4_unit1_sc_weight, stage4_unit1_sc, _plus6, stage4_unit2_bn1_gamma, stage4_unit2_bn1_beta, stage4_unit2_bn1_moving_mean, stage4_unit2_bn1_moving_var, stage4_unit2_bn1, stage4_unit2_relu1, stage4_unit2_conv1_weight, stage4_unit2_conv1, stage4_unit2_bn2_gamma, stage4_unit2_bn2_beta, stage4_unit2_bn2_moving_mean, stage4_unit2_bn2_moving_var, stage4_unit2_bn2, stage4_unit2_relu2, stage4_unit2_conv2_weight, stage4_unit2_conv2, _plus7, bn1_gamma, bn1_beta, bn1_moving_mean, bn1_moving_var, bn1, relu1, pool1, flatten0, fc1_weight, fc1_bias, fc1, softmax_label, softmax]>

上图输出的为Symbol group即计算图中的每个节点,第一个节点为输入节点,以下代码中输入参数data就是输入节点。

net = mx.symbol.FullyConnected(data=net, num_hidden=256, name='fca')

使用以下语句来变向的更改原始网络,删除了原始模型的全连接层,除了‘data’节点以外,对于某个输出节点需要加上‘_output’如下所示:

net = all_layers['flatten0'+'_output']

使用以下语句创建新的连接层:

net = mx.symbol.FullyConnected(data=net, num_hidden=256, name='fca')
net = mx.symbol.SoftmaxOutput(data=net, name='softmax')

2、固定模型参数:

参考文章

alexnet = mx.gluon.model_zoo.vision.alexnet()
alexnet.load_params('alexnet-44335d1f.params',ctx=mx.gpu())

print(alexnet)

输出:

AlexNet(
  (features): HybridSequential(
    (0): Conv2D(64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (2): Conv2D(192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (4): Conv2D(384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): Conv2D(256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): Conv2D(256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (8): Flatten
  )
  (classifier): HybridSequential(
    (0): Dense(4096, Activation(relu))
    (1): Dropout(p = 0.5)
    (2): Dense(4096, Activation(relu))
    (3): Dropout(p = 0.5)
    (4): Dense(1000, linear)
  )
)

通过以下方法固定参数:

featuresnet = alexnet.features
for _, w in featuresnet.collect_params().items():
    w.grad_req = 'null'

关于grad_reg的说明如下:

grad_reg默认有三个参数:

1、write:参数默认值,任何时刻的梯度均可以被写入梯度矩阵

2、add:每次计算求得的梯度都被加入到梯度矩阵中。但是必须手动的调用zero_grad(),在每次新的迭代之前清空梯度缓存。一般情况之下在trainer.step()之后就要

3、

通过以下方法对第二部分进行初始化:

def Classifier():
    net = nn.HybridSequential()
    net.add(nn.Dense(4096, activation="relu"))
    net.add(nn.Dropout(.5))
    net.add(nn.Dense(4096, activation="relu"))
    net.add(nn.Dropout(.5))
    net.add(nn.Dense(10))
    return net

net = nn.HybridSequential()
with net.name_scope():
    net.add(featuresnet)
    net.add(Classifier())
    #net[0] = featuresnet;net[1] = Classifier
    net[1].collect_params().initialize(init=mx.init.Xavier(),ctx=mx.gpu())
net.hybridize()

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值