Caffe实战之Python接口系列(三)Fine-tuning a Pretrained Network

引言

记录学习官网的例程中的一些重要语句,遇到的问题等,内容分散,建议顺序查看。
主要是调用Caffe的Python接口
源文件就在{caffe_root}/examples中,安装sudo pip install jupyter打开即可运行,初学者最好是放在它指定的目录,如,否则要改很多路径。
注:eaxmples是用jupyter notebook写的,部分Cell中出现了一些特殊的用法:
1. 感叹号‘!’:用于执行系统命令,如 !pwd
2. 百分号‘%’:用法太多,如 %matplotlib inline 显示绘图窗口 详见Jupyter Notebook Viewer

目录

微调一个预训练的网络用于风格识别

1. 配置网络并下载数据集

# Helper function for deprocessing preprocessed images, e.g., for display. 逆向处理已经预处理过的图像
def deprocess_net_image(image):
    image = image.copy()              # don't modify destructively
    image = image[::-1]               # BGR -> RGB
    image = image.transpose(1, 2, 0)  # CHW -> HWC
    image += [123, 117, 104]          # (approximately) undo mean subtraction

    # clamp values in [0, 255]
    image[image < 0], image[image > 255] = 0, 255

    # round and cast from float32 to uint8
    image = np.round(image)
    image = np.require(image, dtype=np.uint8)

    return image

2. 定义并运行网络

  • 这个网络定义在写法和表现形式上和上一节有所差别,先对pycaffe的部分层定义进行融合包装,下面通过调用该函数且传入不同参数,能够实现同类型的层可以设置不同的层参数。方便快速构建多层网络。
from caffe import layers as L
from caffe import params as P

weight_param = dict(lr_mult=1, decay_mult=1)
bias_param   = dict(lr_mult=2, decay_mult=0)
learned_param = [weight_param, bias_param]

frozen_param = [dict(lr_mult=0)] * 2 # 将lr_mult设置为0来冻结层参数

def conv_relu(bottom, ks, nout, stride=1, pad=0, group=1,
              param=learned_param,
              weight_filler=dict(type='gaussian', std=0.01),
              bias_filler=dict(type='constant', value=0.1)):
    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
                         num_output=nout, pad=pad, group=group,
                         param=param, weight_filler=weight_filler,
                         bias_filler=bias_filler)
    return conv, L.ReLU(conv, in_place=True)

def fc_relu(bottom, nout, param=learned_param,
            weight_filler=dict(type='gaussian', std=0.005),
            bias_filler=dict(type='constant', value=0.1)):
    fc = L.InnerProduct(bottom, num_output=nout, param=param,
                        weight_filler=weight_filler,
                        bias_filler=bias_filler)
    return fc, L.ReLU(fc, in_place=True)

def max_pool(bottom, ks, stride=1):
    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)

def caffenet(data, label=None, train=True, num_classes=1000,
             classifier_name='fc8', learn_all=False):
    """Returns a NetSpec specifying CaffeNet, following the original proto text
       specification (./models/bvlc_reference_caffenet/train_val.prototxt)."""
    n = caffe.NetSpec()
    n.data = data
    param = learned_param if learn_all else frozen_param
    n.conv1, n.relu1 = conv_relu(n.data, 11, 96, stride=4, param=param)
    n.pool1 = max_pool(n.relu1, 3, stride=2)
    n.norm1 = L.LRN(n.pool1, local_size=5, alpha=1e-4, beta=0.75)
    n.conv2, n.relu2 = conv_relu(n.norm1, 5, 256, pad=2, group=2, param=param)
    n.pool2 = max_pool(n.relu2, 3, stride=2)
    n.norm2 = L.LRN(n.pool2, local_size=5, alpha=1e-4, beta=0.75)
    n.conv3, n.relu3 = conv_relu(n.norm2, 3, 384, pad=1, param=param)
    n.conv4, n.relu4 = conv_relu(n.relu3, 3, 384, pad=1, group=2, param=param)
    n.conv5, n.relu5 = conv_relu(n.relu4, 3, 256, pad=1, group=2, param=param)
    n.pool5 = max_pool(n.relu5, 3, stride=2)
    n.fc6, n.relu6 = fc_relu(n.pool5, 4096, param=param)
    if train: # TEST网络少两个dropout层drop6,drop7
        n.drop6 = fc7input = L.Dropout(n.relu6, in_place=True)
    else:
        fc7input = n.relu6
    n.fc7, n.relu7 = fc_relu(fc7input, 4096, param=param)
    if train:
        n.drop7 = fc8input = L.Dropout(n.relu7, in_place=True)
    else:
        fc8input = n.relu7
    # always learn fc8 (param=learned_param)
    fc8 = L.InnerProduct(fc8input, num_output=num_classes, param=learned_param)
    # give fc8 the name specified by argument `classifier_name` 改掉预训练网络中某一层的命名,那么该层就不会加载weights中的参数转而通过ProtoTXT指定的方式填充
    n.__setattr__(classifier_name, fc8)
    if not train:
        n.probs = L.Softmax(fc8)
    if label is not None:
        n.label = label
        n.loss = L.SoftmaxWithLoss(fc8, n.label)
        n.acc = L.Accuracy(fc8, n.label)
    # write the net to a temporary file and return its filename
    with tempfile.NamedTemporaryFile(delete=False) as f:
        f.write(str(n.to_proto()))
        return f.name
  • 用虚拟数据(dummy data)构建一个网络

    dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
    imagenet_net_filename = caffenet(data=dummy_data, train=False)
    imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST)
  • 用前面下载的风格数据集Flickr定义style_net函数来调用caffenet
  • caffenet的不同之处在于:

    1. ImageData层将Flickr dataset作为网络输入
    2. 输入20个类别而不是原来的1000类
    3. 最后的分类层fc8改名为fc8_flickr,以便Caffe不从预训练模型中加载权重

      def style_net(train=True, learn_all=False, subset=None):
      if subset is None:
      subset = 'train' if train else 'test'
      source = caffe_root + 'data/flickr_style/%s.txt' % subset
      transform_param = dict(mirror=train, crop_size=227,
      mean_file=caffe_root + 'data/ilsvrc12/imagenet_mean.binaryproto')
      style_data, style_label = L.ImageData(
      transform_param=transform_param, source=source,
      batch_size=50, new_height=256, new_width=256, ntop=2)
      return caffenet(data=style_data, label=style_label, train=train,
                  num_classes=NUM_STYLE_LABELS,
                  classifier_name='fc8_flickr',
                  learn_all=learn_all)
  • 用上面定义的style_net函数来初始化untrained_style_net:下载的数据集作为图像输入,加载预训练模型为权重。
  • 调用untrained_style_net.forward()得到一个batch的训练数据

    untrained_style_net = caffe.Net(style_net(train=False, subset='train'),
                                    weights, caffe.TEST)
    untrained_style_net.forward()
    style_data_batch = untrained_style_net.blobs['data'].data.copy()
    style_label_batch = np.array(untrained_style_net.blobs['label'].data, dtype=np.int32)
  • untrained_style_net的一个训练batch中随机取1张(实际选的是第8张)显示,并打印出对应标签。

    batch_index = 8
    image = style_data_batch[batch_index]
    plt.imshow(deprocess_net_image(image))
    print 'actual label =', style_labels[style_label_batch[batch_index]]
  • 把上面这幅图作为imagenet_net的输入,然后跑一次并可视化top 5的预测类别:disp_imagenet_preds(imagenet_net, image),调用的函数见下。
  • 把同样的一幅图作为untrained_style_net的输入,跑一次并可视化top5的预测类别:disp_style_preds(untrained_style_net, image),调用的函数见下。

    def disp_preds(net, image, labels, k=5, name='ImageNet'):
        input_blob = net.blobs['data']
        net.blobs['data'].data[0, ...] = image
        probs = net.forward(start='conv1')['probs'][0]
        top_k = (-probs).argsort()[:k]
        print 'top %d predicted %s labels =' % (k, name)
        print '\n'.join('\t(%d) %5.2f%% %s' % (i+1, 100*probs[p], labels[p])
                        for i, p in enumerate(top_k))
    
    def disp_imagenet_preds(net, image):
        disp_preds(net, image, imagenet_labels, name='ImageNet')
    
    def disp_style_preds(net, image):
        disp_preds(net, image, style_labels, name='style')

    通过更改batch_index查看不同图片的预测结果发现,imagenet_net网络会输出不同的概率,不同可能来源于对风格图像的目标识别,untrained_style_net输出概率相同,因为前面构建网络时指定了最后一个分类层为5个标签,且没加载预训练权重,也指定填充方式,因此该层权重全为零,即softmax输入全为零,最后输出的概率即为1/N,N为标签数量。

  • 还可验证imagenet_netuntrained_style_net的fc7层的输出是一样的,因为都是加载了预训练权重。

    diff = untrained_style_net.blobs['fc7'].data[0] - imagenet_net.blobs['fc7'].data[0]
    error = (diff ** 2).sum()
    assert error < 1e-8
  • 删除untrained_style_net节省内存:del untrained_style_net

3. 训练风格分类器

  • 定义 solver函数来创建Caffe求解器的prototxt,返回prototxt的文件名

    from caffe.proto import caffe_pb2
    
    def solver(train_net_path, test_net_path=None, base_lr=0.001):
        s = caffe_pb2.SolverParameter()
    
        # Specify locations of the train and (maybe) test networks.
        s.train_net = train_net_path
        if test_net_path is not None:
            s.test_net.append(test_net_path)
            s.test_interval = 1000  # Test after every 1000 training iterations.
            s.test_iter.append(100) # Test on 100 batches each time we test.
    
        # The number of iterations over which to average the gradient.
        # Effectively boosts the training batch size by the given factor, without
        # affecting memory utilization.
        s.iter_size = 1
    
        s.max_iter = 100000     # # of times to update the net (training iterations)
    
        # Solve using the stochastic gradient descent (SGD) algorithm.
        # Other choices include 'Adam' and 'RMSProp'.
        s.type = 'SGD'
    
        # Set the initial learning rate for SGD.
        s.base_lr = base_lr
    
        # Set `lr_policy` to define how the learning rate changes during training.
        # Here, we 'step' the learning rate by multiplying it by a factor `gamma`
        # every `stepsize` iterations.
        s.lr_policy = 'step'
        s.gamma = 0.1
        s.stepsize = 20000
    
        # Set other SGD hyperparameters. Setting a non-zero `momentum` takes a
        # weighted average of the current gradient and previous gradients to make
        # learning more stable. L2 weight decay regularizes learning, to help prevent
        # the model from overfitting.
        s.momentum = 0.9
        s.weight_decay = 5e-4
    
        # Display the current training loss and accuracy every 1000 iterations.
        s.display = 1000
    
        # Snapshots are files used to store networks we've trained.  Here, we'll
        # snapshot every 10K iterations -- ten times during training.
        s.snapshot = 10000
        s.snapshot_prefix = caffe_root + 'models/finetune_flickr_style/finetune_flickr_style'
    
        # Train on the GPU.  Using the CPU to train large networks is very slow.
        s.solver_mode = caffe_pb2.SolverParameter.GPU
    
        # Write the solver to a temporary file and return its filename.
        with tempfile.NamedTemporaryFile(delete=False) as f:
            f.write(str(s))
            return f.name
    • 定义run_solvers函数,输入一个求解器列表(有两个或多个不同的求解器),它以轮询的方式逐个执行并记录精度和损失,最后保存权重。函数返回 精度、损失、权重。
    def run_solvers(niter, solvers, disp_interval=10):
        """Run solvers for niter iterations,
           returning the loss and accuracy recorded each iteration.
           `solvers` is a list of (name, solver) tuples."""
        blobs = ('loss', 'acc')
        loss, acc = ({name: np.zeros(niter) for name, _ in solvers}
                     for _ in blobs)
        for it in range(niter):
            for name, s in solvers:
                s.step(1)  # run a single SGD step in Caffe
                loss[name][it], acc[name][it] = (s.net.blobs[b].data.copy()
                                                 for b in blobs)
            if it % disp_interval == 0 or it + 1 == niter:
                loss_disp = '; '.join('%s: loss=%.3f, acc=%2d%%' %
                                      (n, loss[n][it], np.round(100*acc[n][it]))
                                      for n, _ in solvers)
                print '%3d) %s' % (it, loss_disp)     
        # Save the learned weights from both nets.
        weight_dir = tempfile.mkdtemp()
        weights = {}
        for name, s in solvers:
            filename = 'weights.%s.caffemodel' % name
            weights[name] = os.path.join(weight_dir, filename)
            s.net.save(weights[name])
        return loss, acc, weights
  • 创建两个求解器,一个style_solver通过copy_from的方式加载预训练权重,另一个scratch_style_solver采用随机初始化网络,将两个求解器作为列表参数调用run_solvers训练网络。

    niter = 200  # number of iterations to train
    
    
    # Reset style_solver as before.
    
    style_solver_filename = solver(style_net(train=True))
    style_solver = caffe.get_solver(style_solver_filename)
    style_solver.net.copy_from(weights)
    
    
    # For reference, we also create a solver that isn't initialized from
    
    
    # the pretrained ImageNet weights.
    
    scratch_style_solver_filename = solver(style_net(train=True))
    scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)
    
    print 'Running solvers for %d iterations...' % niter
    solvers = [('pretrained', style_solver),
               ('scratch', scratch_style_solver)]
    loss, acc, weights = run_solvers(niter, solvers)
    print 'Done.'
    
    train_loss, scratch_train_loss = loss['pretrained'], loss['scratch']
    train_acc, scratch_train_acc = acc['pretrained'], acc['scratch']
    style_weights, scratch_style_weights = weights['pretrained'], weights['scratch']
    
    
    # Delete solvers to save memory.
    
    del style_solver, scratch_style_solver, solvers
  • 绘制上面保存的两种网络的精度值和损失值的对比图

    plot(np.vstack([train_loss, scratch_train_loss]).T)
    xlabel('Iteration #')
    ylabel('Loss')
  • 定义一个评估函数eval_style_net,通过构建一个新的TEST网络来测试上面训练迭代好的权重参数,看一下网络的平均精度。

    def eval_style_net(weights, test_iters=10):
        test_net = caffe.Net(style_net(train=False), weights, caffe.TEST)
        accuracy = 0
        for it in xrange(test_iters):
            accuracy += test_net.forward()['acc']
        accuracy /= test_iters
        return test_net, accuracy
    test_net, accuracy = eval_style_net(style_weights)
    print 'Accuracy, trained from ImageNet initialization: %3.1f%%' % (100*accuracy, )
    scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights)
    print 'Accuracy, trained from   random initialization: %3.1f%%' % (100*scratch_accuracy, )

4. 端到端微调

  • style_net的参数设置为learn_all=True来训练所有的层(默认情况是learn_all=False冻结FC1到FC7层)

    end_to_end_net = style_net(train=True, learn_all=True)
    
    
    # Set base_lr to 1e-3, the same as last time when learning only the classifier.
    
    
    # You may want to play around with different values of this or other
    
    
    # optimization parameters when fine-tuning.  For example, if learning diverges
    
    
    # (e.g., the loss gets very large or goes to infinity/NaN), you should try
    
    
    # decreasing base_lr (e.g., to 1e-4, then 1e-5, etc., until you find a value
    
    
    # for which learning does not diverge).
    
    base_lr = 0.001
    
    style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
    style_solver = caffe.get_solver(style_solver_filename)
    style_solver.net.copy_from(style_weights)
    
    scratch_style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
    scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)
    scratch_style_solver.net.copy_from(scratch_style_weights)
    
    print 'Running solvers for %d iterations...' % niter
    solvers = [('pretrained, end-to-end', style_solver),
               ('scratch, end-to-end', scratch_style_solver)]
    _, _, finetuned_weights = run_solvers(niter, solvers)
    print 'Done.'
    
    style_weights_ft = finetuned_weights['pretrained, end-to-end']
    scratch_style_weights_ft = finetuned_weights['scratch, end-to-end']
    
    
    # Delete solvers to save memory.
    
    del style_solver, scratch_style_solver, solvers
  • 端到端训练完之后,调用上面定义的评估函数eval_style_net看一下网络的平均精度

    test_net, accuracy = eval_style_net(style_weights_ft)
    print 'Accuracy, finetuned from ImageNet initialization: %3.1f%%' % (100*accuracy, )
    scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights_ft)
    print 'Accuracy, finetuned from   random initialization: %3.1f%%' % (100*scratch_accuracy, )
  • 拿最前面用来测试的那张图片来看一下端到端训练的网络的top5精度

    plt.imshow(deprocess_net_image(image))
    disp_style_preds(test_net, image)
  • 最后在测试网络test_net的batch中(测试集)挑一张图片显示并输出top5

    batch_index = 1
    image = test_net.blobs['data'].data[batch_index]
    plt.imshow(deprocess_net_image(image)) # 显示挑出来的图片
    print 'actual label =', style_labels[int(test_net.blobs['label'].data[batch_index])]
    disp_style_preds(test_net, image) # 输出图片对应的Top5
    disp_imagenet_preds(imagenet_net, image) # 拿完成没有改动ImageNet模型的输出做一个对比

上一篇:Caffe实战之Python接口系列(二)Learning-LeNet

下一篇:Caffe实战之Python接口系列(四)Brewing Logistic Regression then Going Deeper

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值