Caffe训练和测试自己的数据集:学习caffe后跑了自带的例子是不是感觉很不过瘾,学习caffe的目的不是简单做几个练习,而是要用到自己的实际项目或者科研中,所以本文介绍如何从自己的原始图片到lmdb数据,再到训练和测试模型的整个流程。
一.准备数据
1)我们借用网上某童鞋的数据集,来自于淘宝10个商品类1000张图片,每个类100张,当然你也可以根据自己的需要搜集自己想要识别的图片集
2)在caffe/examples文件夹下新建文件夹myfile4,并在myfile4中新建文件夹data用于存放我们的数据集,将下载的数据集中train和val文件复制到data中即
caffe/examples/myfile4/data/train
caffe/examples/myfile4/data/val
二.转换为lmdb格式
1)在myfile4文件夹中新建create_filelist.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
#!/usr/bin/env sh
DATA=examples/myfile4/data
MY=examples/myfile4/data
echo
"Create train.txt..."
rm -rf $MY/train.txt
find $DATA/train -name
15001
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 0/"
>>$MY/train.txt
find $DATA/train -name
15059
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 1/"
>>$MY/train.txt
find $DATA/train -name
62047
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 2/"
>>$MY/train.txt
find $DATA/train -name
68021
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 3/"
>>$MY/train.txt
find $DATA/train -name
73018
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 4/"
>>$MY/train.txt
find $DATA/train -name
73063
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 5/"
>>$MY/train.txt
find $DATA/train -name
80012
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 6/"
>>$MY/train.txt
find $DATA/train -name
92002
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 7/"
>>$MY/train.txt
find $DATA/train -name
92017
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 8/"
>>$MY/train.txt
find $DATA/train -name
95005
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 9/"
>>$MY/train.txt
echo
"Create test.txt..."
rm -rf $MY/val.txt
find $DATA/val -name
15001
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 0/"
>>$MY/val.txt
find $DATA/val -name
15059
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 1/"
>>$MY/val.txt
find $DATA/val -name
62047
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 2/"
>>$MY/val.txt
find $DATA/val -name
68021
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 3/"
>>$MY/val.txt
find $DATA/val -name
73018
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 4/"
>>$MY/val.txt
find $DATA/val -name
73063
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 5/"
>>$MY/val.txt
find $DATA/val -name
80012
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 6/"
>>$MY/val.txt
find $DATA/val -name
92002
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 7/"
>>$MY/val.txt
find $DATA/val -name
92017
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 8/"
>>$MY/val.txt
find $DATA/val -name
95005
*.jpg | cut -d
'/'
-f4-
5
| sed
"s/$/ 9/"
>>$MY/val.txt
echo
"All done"
|
1
|
# sh examples/myfile4/create_filelist.sh
|
2)在myfile4文件夹中新建create_lmdb.sh文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
#!/usr/bin/env sh
MY=examples/myfile4
TRAIN_DATA_ROOT=/home/ghz/caffe/examples/myfile4/data/
VAL_DATA_ROOT=/home/ghz/caffe/examples/myfile4/data/
echo
"Create train lmdb.."
rm -rf $MY/img_train_lmdb
build/tools/convert_imageset \
--shuffle \
--resize_height=
32
\
--resize_width=
32
\
$TRAIN_DATA_ROOT \
$MY/data/train.txt \
$MY/img_train_lmdb
echo
"Create test lmdb.."
rm -rf $MY/img_val_lmdb
build/tools/convert_imageset \
--shuffle \
--resize_height=
32
\
--resize_width=
32
\
$VAL_DATA_ROOT \
$MY/data/val.txt \
$MY/img_val_lmdb
echo
"All Done.."
|
1
|
# sh examples/myfile4/create_lmdb.sh
|
三.计算均值并保存
myfile4中新建文件create_meanfile.sh
1
2
3
4
5
6
7
|
EXAMPLE=examples/myfile4
DATA=examples/myfile4
TOOLS=build/tools
$TOOLS/compute_image_mean $EXAMPLE/img_train_lmdb $DATA/mean.binaryproto
echo
"Done."
|
四.创建模型并编写配置文件
在myfile4中创建myfile4_train_test.prototxt文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
|
name:
"myfile4"
layer {
name:
"data"
type:
"Data"
top:
"data"
top:
"label"
include {
phase: TRAIN
}
transform_param {
mean_file:
"examples/myfile4/mean.binaryproto"
}
data_param {
source:
"examples/myfile4/img_train_lmdb"
batch_size:
50
backend: LMDB
}
}
layer {
name:
"cifar"
type:
"Data"
top:
"data"
top:
"label"
include {
phase: TEST
}
transform_param {
mean_file:
"examples/myfile4/mean.binaryproto"
}
data_param {
source:
"examples/myfile4/img_val_lmdb"
batch_size:
50
backend: LMDB
}
}
layer {
name:
"conv1"
type:
"Convolution"
bottom:
"data"
top:
"conv1"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
convolution_param {
num_output:
32
pad:
2
kernel_size:
5
stride:
1
weight_filler {
type:
"gaussian"
std:
0.0001
}
bias_filler {
type:
"constant"
}
}
}
layer {
name:
"pool1"
type:
"Pooling"
bottom:
"conv1"
top:
"pool1"
pooling_param {
pool: MAX
kernel_size:
3
stride:
2
}
}
layer {
name:
"relu1"
type:
"ReLU"
bottom:
"pool1"
top:
"pool1"
}
layer {
name:
"conv2"
type:
"Convolution"
bottom:
"pool1"
top:
"conv2"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
convolution_param {
num_output:
32
pad:
2
kernel_size:
5
stride:
1
weight_filler {
type:
"gaussian"
std:
0.01
}
bias_filler {
type:
"constant"
}
}
}
layer {
name:
"relu2"
type:
"ReLU"
bottom:
"conv2"
top:
"conv2"
}
layer {
name:
"pool2"
type:
"Pooling"
bottom:
"conv2"
top:
"pool2"
pooling_param {
pool: AVE
kernel_size:
3
stride:
2
}
}
layer {
name:
"conv3"
type:
"Convolution"
bottom:
"pool2"
top:
"conv3"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
convolution_param {
num_output:
64
pad:
2
kernel_size:
5
stride:
1
weight_filler {
type:
"gaussian"
std:
0.01
}
bias_filler {
type:
"constant"
}
}
}
layer {
name:
"relu3"
type:
"ReLU"
bottom:
"conv3"
top:
"conv3"
}
layer {
name:
"pool3"
type:
"Pooling"
bottom:
"conv3"
top:
"pool3"
pooling_param {
pool: AVE
kernel_size:
3
stride:
2
}
}
layer {
name:
"ip1"
type:
"InnerProduct"
bottom:
"pool3"
top:
"ip1"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
inner_product_param {
num_output:
64
weight_filler {
type:
"gaussian"
std:
0.1
}
bias_filler {
type:
"constant"
}
}
}
layer {
name:
"ip2"
type:
"InnerProduct"
bottom:
"ip1"
top:
"ip2"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
inner_product_param {
num_output:
10
weight_filler {
type:
"gaussian"
std:
0.1
}
bias_filler {
type:
"constant"
}
}
}
layer {
name:
"accuracy"
type:
"Accuracy"
bottom:
"ip2"
bottom:
"label"
top:
"accuracy"
include {
phase: TEST
}
}
layer {
name:
"loss"
type:
"SoftmaxWithLoss"
bottom:
"ip2"
bottom:
"label"
top:
"loss"
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
net:
"examples/myfile4/myfile4_train_test.prototxt"
test_iter:
2
test_interval:
50
base_lr:
0.001
lr_policy:
"step"
gamma:
0.1
stepsize:
400
momentum:
0.9
weight_decay:
0.004
display:
10
max_iter:
2000
snapshot:
2000
snapshot_prefix:
"examples/myfile4/my"
solver_mode: CPU
|
在caffe根目录下执行
1
|
# build/tools/caffe train -solver examples/myfile4/myfile4_solver.prototxt
|
六.用训练好的模型进行分类
1)在myfile4中新建文件synset_words.txt
1
2
3
4
5
6
7
8
9
10
|
biao
fajia
kuzi
xiangzi
yizi
dianshi
suannai
xiangshui
hufupin
xiezi
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
|
name:
"myfile4"
layer {
name:
"data"
type:
"Input"
top:
"data"
input_param{shape: {dim:
1
dim:
3
dim:
32
dim:
32
}}
}
layer {
name:
"conv1"
type:
"Convolution"
bottom:
"data"
top:
"conv1"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
convolution_param {
num_output:
32
pad:
2
kernel_size:
5
stride:
1
}
}
layer {
name:
"pool1"
type:
"Pooling"
bottom:
"conv1"
top:
"pool1"
pooling_param {
pool: MAX
kernel_size:
3
stride:
2
}
}
layer {
name:
"relu1"
type:
"ReLU"
bottom:
"pool1"
top:
"pool1"
}
layer {
name:
"conv2"
type:
"Convolution"
bottom:
"pool1"
top:
"conv2"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
convolution_param {
num_output:
32
pad:
2
kernel_size:
5
stride:
1
}
}
layer {
name:
"relu2"
type:
"ReLU"
bottom:
"conv2"
top:
"conv2"
}
layer {
name:
"pool2"
type:
"Pooling"
bottom:
"conv2"
top:
"pool2"
pooling_param {
pool: AVE
kernel_size:
3
stride:
2
}
}
layer {
name:
"conv3"
type:
"Convolution"
bottom:
"pool2"
top:
"conv3"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
convolution_param {
num_output:
64
pad:
2
kernel_size:
5
stride:
1
}
}
layer {
name:
"relu3"
type:
"ReLU"
bottom:
"conv3"
top:
"conv3"
}
layer {
name:
"pool3"
type:
"Pooling"
bottom:
"conv3"
top:
"pool3"
pooling_param {
pool: AVE
kernel_size:
3
stride:
2
}
}
layer {
name:
"ip1"
type:
"InnerProduct"
bottom:
"pool3"
top:
"ip1"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
inner_product_param {
num_output:
64
}
}
layer {
name:
"ip2"
type:
"InnerProduct"
bottom:
"ip1"
top:
"ip2"
param {
lr_mult:
1
}
param {
lr_mult:
2
}
inner_product_param {
num_output:
10
}
}
layer {
name:
"prob"
type:
"Softmax"
bottom:
"ip2"
top:
"prob"
}
|
3)在myfile4中新建文件夹images,其中放入你想要分类的图片,比如我的是images/111.jpg,测试图片可以来自于我们下载的图像集中
4)在myfile4中新建文件demo.sh
1
|
./build/examples/cpp_classification/classification.bin examples/myfile4/deploy.prototxt examples/myfile4/my_iter_2000.caffemodel examples/myfile4/mean.binaryproto examples/myfile4/synset_words.txt examples/myfile4/images/
111
.jpg
|
1
|
# sh examples/myfile4/demo.sh
|
至此,我们完成了Caffe训练和测试自己的数据集,如有问题欢迎交流。