一、准备样本数据
获取minist的数据包。
这个版本是四个数据包
learning@learning-virtual-machine:~/caffe/data/mnist$ ./get_mnist.sh
learning@learning-virtual-machine:~/caffe/data/mnist$ ls
get_mnist.sh
learning@learning-virtual-machine:~/caffe/data/mnist$ ./get_mnist.sh
Downloading...
--2016-05-11 17:35:10-- http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 128.122.47.89
Connecting to yann.lecun.com (yann.lecun.com)|128.122.47.89|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9912422 (9.5M) [application/x-gzip]
Saving to: ‘train-images-idx3-ubyte.gz’
train-images-idx 100%[===========>] 9.45M 225KB/s in 73s
2016-05-11 17:36:24 (133 KB/s) - ‘train-images-idx3-ubyte.gz’ saved [9912422/9912422]
--2016-05-11 17:36:38-- http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 128.122.47.89
Connecting to yann.lecun.com (yann.lecun.com)|128.122.47.89|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28881 (28K) [application/x-gzip]
Saving to: ‘train-labels-idx1-ubyte.gz’
train-labels-idx 100%[===========>] 28.20K 1.41KB/s in 7.5s
2016-05-11 17:36:49 (3.75 KB/s) - ‘train-labels-idx1-ubyte.gz’ saved [28881/28881]
--2016-05-11 17:36:49-- http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 128.122.47.89
Connecting to yann.lecun.com (yann.lecun.com)|128.122.47.89|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1648877 (1.6M) [application/x-gzip]
Saving to: ‘t10k-images-idx3-ubyte.gz’
t10k-images-idx3 100%[===========>] 1.57M 71.9KB/s in 19s
2016-05-11 17:37:08 (85.5 KB/s) - ‘t10k-images-idx3-ubyte.gz’ saved [1648877/1648877]
--2016-05-11 17:37:09-- http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Resolving yann.lecun.com (yann.lecun.com)... 128.122.47.89
Connecting to yann.lecun.com (yann.lecun.com)|128.122.47.89|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4542 (4.4K) [application/x-gzip]
Saving to: ‘t10k-labels-idx1-ubyte.gz’
t10k-labels-idx1 100%[===========>] 4.44K --.-KB/s in 0s
2016-05-11 17:37:09 (31.5 MB/s) - ‘t10k-labels-idx1-ubyte.gz’ saved [4542/4542]
learning@learning-virtual-machine:~/caffe/data/mnist$
get_mnist.sh代码:
#!/usr/bin/env sh
# This scripts downloads the mnist data and unzips it.
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd $DIR
echo "Downloading..."
for fname in train-images-idx3-ubyte train-labels-idx1-ubyte t10k-images-idx3-ubyte t10k-labels-idx1-ubyte
do
if [ ! -e $fname ]; then
wget --no-check-certificate http://yann.lecun.com/exdb/mnist/${fname}.gz
gunzip ${fname}.gz
fi
done
执行./examples/mnist/create_mnist.sh
create_mnist.sh代码:
#!/usr/bin/env sh
# This script converts the mnist data into lmdb/leveldb format,
# depending on the value assigned to $BACKEND.
EXAMPLE=examples/mnist
DATA=data/mnist
BUILD=build/examples/mnist
BACKEND="lmdb"
echo "Creating ${BACKEND}..."
rm -rf $EXAMPLE/mnist_train_${BACKEND}
rm -rf $EXAMPLE/mnist_test_${BACKEND}
$BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte \
$DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}
$BUILD/convert_mnist_data.bin $DATA/t10k-images-idx3-ubyte \
$DATA/t10k-labels-idx1-ubyte $EXAMPLE/mnist_test_${BACKEND} --backend=${BACKEND}
echo "Done."
create_mnist.sh是利用caffe/build/examples/mnist/的convert_mnist_data.bin工具,
将mnist date转化为可用的lmdb格式的文件。
并将新生成的2个文件mnist-train-lmdb 和 mnist-test-lmdb放于create_mnist.sh同目录下。
learning@learning-virtual-machine:~/caffe$ ./examples/mnist/create_mnist.sh
Creating lmdb…
I0511 17:50:53.378334 63891 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I0511 17:50:53.382064 63891 convert_mnist_data.cpp:88] A total of 60000 items.
I0511 17:50:53.382319 63891 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0511 17:51:26.376051 63891 convert_mnist_data.cpp:108] Processed 60000 files.
I0511 17:51:26.533220 63894 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_test_lmdb
I0511 17:51:26.534319 63894 convert_mnist_data.cpp:88] A total of 10000 items.
I0511 17:51:26.534453 63894 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0511 17:51:31.699584 63894 convert_mnist_data.cpp:108] Processed 10000 files.
Done.
learning@learning-virtual-machine:~/caffe$
二、训练
learning@learning-virtual-machine:~/caffe$ ./examples/mnist/train_lenet.sh
出现问题:
I0511 17:52:25.115056 63914 caffe.cpp:185] Using GPUs 0
F0511 17:52:25.116345 63914 common.cpp:66] Cannot use GPU in CPU-only Caffe: check mode.
..* Check failure stack trace: *
@ 0x7f6c824c65cd google::LogMessage::Fail()
@ 0x7f6c824c8433 google::LogMessage::SendToLog()
@ 0x7f6c824c615b google::LogMessage::Flush()
@ 0x7f6c824c8e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f6c8284c7f0 caffe::Caffe::SetDevice()
@ 0x40a1f3 train()
@ 0x406e80 main
@ 0x7f6c81753a40 __libc_start_main
@ 0x407539 _start
@ (nil) (unknown)
Aborted (core dumped)
learning@learning-virtual-machine:~/caffe$
解决问题:
learning@learning-virtual-machine:~/caffe/examples/mnist$ sudo gedit lenet_solver.prototxt
learning@learning-virtual-machine:~/caffe$ ./examples/mnist/train_lenet.sh
train_lenet.sh文件
#!/usr/bin/env sh
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt
lenet_solver.prototxt文件
# The train/test net protocol buffer definition
//网络协议具体定义
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch(批) size 100 and 100 test iterations(迭代),
# covering the full 10,000 testing images.
test_iter: 100//测试迭代次数 如果batch size=100,则100张图一批,训练100次,则可以覆盖10000张图的需求
# Carry out testing every 500 training iterations.
test_interval: 500//训练迭代500次,测试一次
# The base learning rate, momentum and the weight decay of the network.//网络参数:学习率,动量,权重的衰减
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy//学习策略:有固定学习率和每步递减学习率 (step)
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100 //每迭代100次显示一次
# The maximum number of iterations
max_iter: 10000 //最大迭代次数
# snapshot intermediate results
snapshot: 5000 // 每5000次迭代存储一次数据,路径前缀是<<SPAN
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: CPU
训练完毕:
oss = 0.0069052 (* 1 = 0.0069052 loss)
I0512 08:48:30.908164 2545 sgd_solver.cpp:106] Iteration 9900, lr = 0.00596843
I0512 08:48:50.392801 2545 solver.cpp:454] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel
I0512 08:48:50.468799 2545 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_10000.solverstate
I0512 08:48:50.574422 2545 solver.cpp:317] Iteration 10000, loss = 0.00475874
I0512 08:48:50.574723 2545 solver.cpp:337] Iteration 10000, Testing net (#0)
I0512 08:49:05.222337 2545 solver.cpp:404] Test net output #0: accuracy = 0.9912
I0512 08:49:05.223168 2545 solver.cpp:404] Test net output #1: loss = 0.0288119 (* 1 = 0.0288119 loss)
I0512 08:49:05.223314 2545 solver.cpp:322] Optimization Done.
I0512 08:49:05.223942 2545 caffe.cpp:222] Optimization Done.
learning@learning-virtual-machine:~/caffe$
三、测试
./build/tools/caffe.bin test -model=examples/mnist/lenet_train_test.prototxt -weights=examples/mnist/lenet_iter_10000.caffemodel
lenet_train_test.prototxt分析
test:表示对训练好的模型进行Testing,而不是training。其他参数包括train, time, device_query。
-model=XXX:指定模型prototxt文件,这是一个文本文件,详细描述了网络结构和数据集信息
I0512 09:18:41.455747 3503 caffe.cpp:275] Batch 44, loss = 0.0137619
I0512 09:18:41.671058 3503 caffe.cpp:275] Batch 45, accuracy = 0.99
I0512 09:18:41.671362 3503 caffe.cpp:275] Batch 45, loss = 0.0446652
I0512 09:18:41.910468 3503 caffe.cpp:275] Batch 46, accuracy = 1
I0512 09:18:41.910781 3503 caffe.cpp:275] Batch 46, loss = 0.00462838
I0512 09:18:42.082020 3503 caffe.cpp:275] Batch 47, accuracy = 0.99
I0512 09:18:42.082260 3503 caffe.cpp:275] Batch 47, loss = 0.0215265
I0512 09:18:42.297307 3503 caffe.cpp:275] Batch 48, accuracy = 0.96
I0512 09:18:42.301200 3503 caffe.cpp:275] Batch 48, loss = 0.0964929
I0512 09:18:42.576354 3503 caffe.cpp:275] Batch 49, accuracy = 1
I0512 09:18:42.576627 3503 caffe.cpp:275] Batch 49, loss = 0.00345927
I0512 09:18:42.576732 3503 caffe.cpp:280] Loss: 0.0427004
I0512 09:18:42.576843 3503 caffe.cpp:292] accuracy = 0.9872
I0512 09:18:42.576954 3503 caffe.cpp:292] loss = 0.0427004 (* 1 = 0.0427004 loss)
learning@learning-virtual-machine:~/caffe$