前言
很多地方我们都需要用到多标签分类,比如一张图片,上面有只蓝猫,另一张图片上面有一只黄狗,那么我们要识别的时候,就可以采用多标签分类这一思想了。任务一是识别出这个到底是猫还是狗?(类型)任务二是识别出这是蓝还是黄?(颜色)
网上看了几篇教程,有讲的非常好的,也有出bug飞上了天的(吐槽啊喂!)这里还是主要讲讲这篇:http://chuansong.me/n/494753151240。我自己已经测试了,可行,给薛大牛一个赞!但是遗憾的是这篇文章的内容严重不足啊(连lmdb生成的命令行格式都没有,还是我自己看代码琢磨了一下…)我就给这篇文章补充补充,给一些例子。
任务
我这里给出一个具体的任务咯,要求在以下图片中,识别出汽车品牌和车辆外形。汽车品牌分为:Benz/BMW/Audi 车辆外形分为:Sedan/SUV。这是一个只有72张图片的小数据库,包括了测试和训练集:
其中标注是这样的,Audi=0,BMW=1,Benz=2. Sedan =0, SUV=1。所以如果这辆车是奥迪的SUV,标注就是: xx.jpg 0 1。在数据库中,标注已经做好了。数据集的下载方式在文章的最后。
定义我们的网络结构
我们这里采用的是上述文章中薛大牛的方法,两个data层,一个data只放图片,另一个data放label,label通过slice layer切开。然后我们开始定义网络!修改AlexNet!这是我的网络:
name: "ZnNet"
layer {
name: "data"
type: "Data"
top: "data"
transform_param {
mirror: true
crop_size: 227
mean_file: "models/bvlc_alexnet/ZnCarTrainMean.binaryproto"
}
include {
phase: TRAIN
}
data_param {
source: "models/bvlc_alexnet/ZnCarTrainImage"
batch_size: 10
backend: LMDB
}
}
layer {
name: "labels"
type: "Data"
top: "labels"
include {
phase: TRAIN
}
data_param {
source: "models/bvlc_alexnet/ZnCarTrainLabel"
batch_size: 10
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
transform_param {
crop_size: 227
mean_file: "models/bvlc_alexnet/ZnCarTestMean.binaryproto"
}
include {
phase: TEST
}
data_param {
source: "models/bvlc_alexnet/ZnCarTestImage"
batch_size: 12
backend: LMDB
}
}
layer {
name: "labels"
type: "Data"
top: "labels"
include {
phase: TEST
}
data_param {
source: "models/bvlc_alexnet/ZnCarTestLabel"
batch_size: 12
backend: LMDB
}
}
layer {
name: "slice"
type: "Slice"
bottom: "labels"
top: "type" #汽车品牌
top: "surface" #车的外形
slice_param {
axis: 1
slice_point: 1
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"