02-net-surgery.ipynb

http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb


这个官方脚本主要是学习一下第二部分,全连接改全卷积。

Caffe networks can be transformed to your particular needs by editing the model parameters. The data, diffs, and parameters of a net are all exposed in pycaffe.

Roll up your sleeves for net surgery with pycaffe!

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Make sure that caffe is on the python path:
caffe_root = '../'  # this file is expected to be in {caffe_root}/examples
import sys
sys.path.insert(0, caffe_root + 'python')

import caffe

# configure plotting
plt.rcParams['figure.figsize'] = (10, 10)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

Designer Filters

To show how to load, manipulate, and save parameters we'll design our own filters into a simple network that's only a single convolution layer. This net has two blobs, data for the input and conv for the convolution output and one parameter conv for the convolution filter weights and biases.

In [2]:
# Load the net, list its data and params, and filter an example image.
caffe.set_mode_cpu()
net = caffe.Net('net_surgery/conv.prototxt', caffe.TEST)
print("blobs {}\nparams {}".format(net.blobs.keys(), net.params.keys()))

# load image and prepare as a single input batch for Caffe
im = np.array(caffe.io.load_image('images/cat_gray.jpg', color=False)).squeeze()
plt.title("original image")
plt.imshow(im)
plt.axis('off')

im_input = im[np.newaxis, np.newaxis, :, :]
net.blobs['data'].reshape(*im_input.shape)
net.blobs['data'].data[...] = im_input
blobs ['data', 'conv']
params ['conv']

The convolution weights are initialized from Gaussian noise while the biases are initialized to zero. These random filters give output somewhat like edge detections.

In [3]:
# helper show filter outputs
def show_filters(net):
    net.forward()
    plt.figure()
    filt_min, filt_max = net.blobs['conv'].data.min(), net.blobs['conv'].data.max()
    for i in range(3):
        plt.subplot(1,4,i+2)
        plt.title("filter #{} output".format(i))
        plt.imshow(net.blobs['conv'].data[0, i], vmin=filt_min, vmax=filt_max)
        plt.tight_layout()
        plt.axis('off')

# filter the image with initial 
show_filters(net)


Raising the bias of a filter will correspondingly raise its output:

In [4]:
# pick first filter output
conv0 = net.blobs['conv'].data[0, 0]
print("pre-surgery output mean {:.2f}".format(conv0.mean()))
# set first filter bias to 1
net.params['conv'][1].data[0] = 1.
net.forward()
print("post-surgery output mean {:.2f}".format(conv0.mean()))
pre-surgery output mean -0.02
post-surgery output mean 0.98

Altering the filter weights is more exciting since we can assign any kernel like Gaussian blur, the Sobel operator for edges, and so on. The following surgery turns the 0th filter into a Gaussian blur and the 1st and 2nd filters into the horizontal and vertical gradient parts of the Sobel operator.

See how the 0th output is blurred, the 1st picks up horizontal edges, and the 2nd picks up vertical edges.

In [5]:
ksize = net.params['conv'][0].data.shape[2:]
# make Gaussian blur
sigma = 1.
y, x = np.mgrid[-ksize[0]//2 + 1:ksize[0]//2 + 1, -ksize[1]//2 + 1:ksize[1]//2 + 1]
g = np.exp(-((x**2 + y**2)/(2.0*sigma**2)))
gaussian = (g / g.sum()).astype(np.float32)
net.params['conv'][0].data[0] = gaussian
# make Sobel operator for edge detection
net.params['conv'][0].data[1:] = 0.
sobel = np.array((-1, -2, -1, 0, 0, 0, 1, 2, 1), dtype=np.float32).reshape((3,3))
net.params['conv'][0].data[1, 0, 1:-1, 1:-1] = sobel  # horizontal
net.params['conv'][0].data[2, 0, 1:-1, 1:-1] = sobel.T  # vertical
show_filters(net)


With net surgery, parameters can be transplanted across nets, regularized by custom per-parameter operations, and transformed according to your schemes.

Casting a Classifier into a Fully Convolutional Network

Let's take the standard Caffe Reference ImageNet model "CaffeNet" and transform it into a fully convolutional net for efficient, dense inference on large inputs. This model generates a classification map that covers a given input size instead of a single classification. In particular a 8  × × 8 classification map on a 451  × × 451 input gives 64x the output in only 3x the time. The computation exploits a natural efficiency of convolutional network (convnet) structure by amortizing the computation of overlapping receptive fields.

To do so we translate the InnerProduct matrix multiplication layers of CaffeNet into Convolutional layers. This is the only change: the other layer types are agnostic to spatial size. Convolution is translation-invariant, activations are elementwise operations, and so on. The fc6 inner product when carried out as convolution byfc6-conv turns into a 6  × × 6 filter with stride 1 on pool5. Back in image space this gives a classification for each 227  × × 227 box with stride 32 in pixels. Remember the equation for output map / receptive field size, output = (input - kernel_size) / stride + 1, and work out the indexing details for a clear understanding.

In [6]:
!diff net_surgery/bvlc_caffenet_full_conv.prototxt ../models/bvlc_reference_caffenet/deploy.prototxt
1,2c1
< # Fully convolutional network version of CaffeNet.
< name: "CaffeNetConv"
---
> name: "CaffeNet"
7,11c6
<   input_param {
<     # initial shape for a fully convolutional network:
<     # the shape can be set for each input by reshape.
<     shape: { dim: 1 dim: 3 dim: 451 dim: 451 }
<   }
---
>   input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } }
157,158c152,153
<   name: "fc6-conv"
<   type: "Convolution"
---
>   name: "fc6"
>   type: "InnerProduct"
160,161c155,156
<   top: "fc6-conv"
<   convolution_param {
---
>   top: "fc6"
>   inner_product_param {
163d157
<     kernel_size: 6
169,170c163,164
<   bottom: "fc6-conv"
<   top: "fc6-conv"
---
>   bottom: "fc6"
>   top: "fc6"
175,176c169,170
<   bottom: "fc6-conv"
<   top: "fc6-conv"
---
>   bottom: "fc6"
>   top: "fc6"
182,186c176,180
<   name: "fc7-conv"
<   type: "Convolution"
<   bottom: "fc6-conv"
<   top: "fc7-conv"
<   convolution_param {
---
>   name: "fc7"
>   type: "InnerProduct"
>   bottom: "fc6"
>   top: "fc7"
>   inner_product_param {
188d181
<     kernel_size: 1
194,195c187,188
<   bottom: "fc7-conv"
<   top: "fc7-conv"
---
>   bottom: "fc7"
>   top: "fc7"
200,201c193,194
<   bottom: "fc7-conv"
<   top: "fc7-conv"
---
>   bottom: "fc7"
>   top: "fc7"
207,211c200,204
<   name: "fc8-conv"
<   type: "Convolution"
<   bottom: "fc7-conv"
<   top: "fc8-conv"
<   convolution_param {
---
>   name: "fc8"
>   type: "InnerProduct"
>   bottom: "fc7"
>   top: "fc8"
>   inner_product_param {
213d205
<     kernel_size: 1
219c211
<   bottom: "fc8-conv"
---
>   bottom: "fc8"

The only differences needed in the architecture are to change the fully connected classifier inner product layers into convolutional layers with the right filter size -- 6 x 6, since the reference model classifiers take the 36 elements of pool5 as input -- and stride 1 for dense classification. Note that the layers are renamed so that Caffe does not try to blindly load the old parameters when it maps layer names to the pretrained model.

In [7]:载入原先的网络配置文件和模型
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/bvlc_reference_caffenet/deploy.prototxt', 
                '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel', 
                caffe.TEST)
params = ['fc6', 'fc7', 'fc8']
# fc_params = {name: (weights, biases)}
fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}

for fc in params:
    print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)
原先模型的全连接层的权重和偏置的shape
fc6 weights are (4096, 9216) dimensional and biases are (4096,) dimensional
fc7 weights are (4096, 4096) dimensional and biases are (4096,) dimensional
fc8 weights are (1000, 4096) dimensional and biases are (1000,) dimensional

Consider the shapes of the inner product parameters. The weight dimensions are the output and input sizes while the bias dimension is the output size.

In [8]:载入更改后的全卷积网络配置和原先模型
# Load the fully convolutional network to transplant the parameters.
net_full_conv = caffe.Net('net_surgery/bvlc_caffenet_full_conv.prototxt', 
                          '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
                          caffe.TEST)
params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']
# conv_params = {name: (weights, biases)}
conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}

for conv in params_full_conv:
    print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)
全卷积网络的的全卷积层的权重和偏置的shape

fc6-conv weights are (4096, 256, 6, 6) dimensional and biases are (4096,) dimensional
fc7-conv weights are (4096, 4096, 1, 1) dimensional and biases are (4096,) dimensional
fc8-conv weights are (1000, 4096, 1, 1) dimensional and biases are (1000,) dimensional


卷积层的权重格式是output × input × height × width,为了将内积权重映射为卷积滤波器,我们需要将内积向量填充进channel × height × width的滤波器矩阵,但实际上内存是行优先阵列,因此二者是一致的。除了权重,偏移也是一样的。

The convolution weights are arranged in output  × × input  × × height  × × width dimensions. To map the inner product weights to convolution filters, we could roll the flat inner product vectors into channel  × × height  × × width filter matrices, but actually these are identical in memory (as row major arrays) so we can assign them directly.

The biases are identical to those of the inner product.

Let's transplant!

In [9]:开始移植
for pr, pr_conv in zip(params, params_full_conv):
    conv_params[pr_conv][0].flat = fc_params[pr][0].flat  # flat unrolls the arrays
    conv_params[pr_conv][1][...] = fc_params[pr][1]

Next, save the new model weights.

In [10]:保存新的模型权重
net_full_conv.save('net_surgery/bvlc_caffenet_full_conv.caffemodel')

To conclude, let's make a classification map from the example cat image and visualize the confidence of "tiger cat" as a probability heatmap. This gives an 8-by-8 prediction on overlapping regions of the 451  × × 451 input.

In [11]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# load input and configure preprocessing
im = caffe.io.load_image('images/cat.jpg')
transformer = caffe.io.Transformer({'data': net_full_conv.blobs['data'].data.shape})
transformer.set_mean('data', np.load('../python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))
transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)
# make classification map by forward and print prediction indices at each location
out = net_full_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)]))
print out['prob'][0].argmax(axis=0)
# show net input and confidence map (probability of the top prediction at each location)
plt.subplot(1, 2, 1)
plt.imshow(transformer.deprocess('data', net_full_conv.blobs['data'].data[0]))
plt.subplot(1, 2, 2)
plt.imshow(out['prob'][0,281])
[[282 282 281 281 281 281 277 282]
 [281 283 283 281 281 281 281 282]
 [283 283 283 283 283 283 287 282]
 [283 283 283 281 283 283 283 259]
 [283 283 283 283 283 283 283 259]
 [283 283 283 283 283 283 259 259]
 [283 283 283 283 259 259 259 277]
 [335 335 283 259 263 263 263 277]]
Out[11]:
<matplotlib.image.AxesImage at 0x12379a690>

The classifications include various cats -- 282 = tiger cat, 281 = tabby, 283 = persian -- and foxes and other mammals.

In this way the fully connected layers can be extracted as dense features across an image (see net_full_conv.blobs['fc6'].data for instance), which is perhaps more useful than the classification map itself.

Note that this model isn't totally appropriate for sliding-window detection since it was trained for whole-image classification. Nevertheless it can work just fine. Sliding-window training and finetuning can be done by defining a sliding-window ground truth and loss such that a loss map is made for every location and solving as usual. (This is an exercise for the reader.)


下面是全卷积的配置文件

https://github.com/BVLC/caffe/blob/master/examples/net_surgery/conv.prototxt

# Simple single-layer network to showcase editing model parameters.
name: "convolution"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 1 dim: 1 dim: 100 dim: 100 } }
}
layer {
  name: "conv"
  type: "Convolution"
  bottom: "data"
  top: "conv"
  convolution_param {
    num_output: 3
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

https://github.com/BVLC/caffe/blob/master/examples/net_surgery/bvlc_caffenet_full_conv.prototxt

# Fully convolutional network version of CaffeNet.
name: "CaffeNetConv"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param {
    # initial shape for a fully convolutional network:
    # the shape can be set for each input by reshape.
    shape: { dim: 1 dim: 3 dim: 451 dim: 451 }
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6-conv"
  type: "Convolution"
  bottom: "pool5"
  top: "fc6-conv"
  convolution_param {
    num_output: 4096
    kernel_size: 6
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6-conv"
  top: "fc6-conv"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6-conv"
  top: "fc6-conv"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7-conv"
  type: "Convolution"
  bottom: "fc6-conv"
  top: "fc7-conv"
  convolution_param {
    num_output: 4096
    kernel_size: 1
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7-conv"
  top: "fc7-conv"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7-conv"
  top: "fc7-conv"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8-conv"
  type: "Convolution"
  bottom: "fc7-conv"
  top: "fc8-conv"
  convolution_param {
    num_output: 1000
    kernel_size: 1
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8-conv"
  top: "prob"
}


原文链接:http://blog.csdn.net/zy3381/article/details/50470382
各层参数计算:http://blog.csdn.net/l494926429/article/details/51479672

net_surgery中如何将全连接层转换成卷积层

最近在读Fully convolutional networks for semantic segmentation这篇论文,看到里面很多地方都感觉有些不太明白,只好一点一点来刨了。首先这个net_surgery的例子是非常重要的一个启示,关于如何把一个classifier转换成fully convolutional network

 

原始输入层定义:

https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/deploy.prototxt

Name:Alex(假设取名为Alex)

Input_shape{

Dim:10

Dim:3

Dim:227

Dim:227

}

layer {

name: "fc6"

type: "InnerProduct"

bottom: "pool5"

top: "fc6"

inner_product_param {

num_output: 4096

}

}

 

 

调整后的输入定义:

https://github.com/BVLC/caffe/blob/master/examples/net_surgery/bvlc_caffenet_full_conv.prototxt

Name:FCN(假设取名为FCN)

Input_shape{

Dim:10

Dim:3

Dim:451

Dim:451

}

layer {

name: "fc6-conv"

type: "Convolution"

bottom: "pool5"

top: "fc6-conv"

convolution_param {

num_output: 4096

kernel_size: 6

}

}

 

 

最重要的地方就是pool5后第一个全连接层的修改,花了些时间进行分析才算看明白了,记录下来怕后面看到又忘了。

 

对网络每一层的数据(blobs)以及参数(params)进行分析,可以推出,

在1x3x227x227的输入的前提下,Alex的pool5的数据(blobs)大小(shape)为1x256x6x6,这个时候下一层是全连接层fc6,它的num_output为4096,这时候fc6层的参数(params)大小(shape)为:4096x9216(1*256*6*6=9216)

 

然后对网络的输入进行修改,将输入放大到1x3x451x451,保留其他的参数不变,这个时候再次进行推算,可以得出FCN的pool5的数据(blobs)大小(shape)为1x256x13x13,然后将下一层的全连接层fc6修改为卷积层fc6-conv,num_output为4096,kernel_size为6,此时该层的参数(params)大小(shape)为4096x256x6x6,换算一下,刚好为4096x9216(256*6*6=9216)。

 

至此,可以将3x227x227输入情况下Alex的fc6层的参数(4096x9216)赋值给3x451x451输入情况下FCN的fc6-conv层(4096x9216)


一开始我一直以为是在相同输入大小的情况下来进行转换,使用451x451进行分析,得出pool5层后的fc层参数大小是4096x256x13x13,也就是4096x43264,然而models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel里的参数是(4096x9216),所以是没法赋值过来的。


如果不调整输入大小,直接将fc6层改为卷积层的话是可以做到参数相大小相匹配的,这个时候pool5为1x256x6x6,然后fc6-conv设定为num_output=4096,kernel_size:6,同样可以达到fc6-conv层的参数为4096x256x6x6=4096x9216的效果,但是注意观察进一步fc6-conv层之后的数据大小会发现,生成的feature_map边长变为(6-6)/1+1=1,只剩下了1x4096x1x1了,当然这样子做也没有什么问题,只是这1x1的的feature_map没法看到heat map的效果了啊。



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值