一。模型基本组成
想要训练一个caffe模型,需要配置两个文件,包含两个部分:网络模型,参数配置,分别对应*.prototxt , ****_solver.prototxt文件。
Caffe模型文件解析:
预处理图像的leveldb构建
输入:一批图像和label (2和3)
输出:leveldb (4)
指令里包含如下信息:
- conver_imageset (构建leveldb的可运行程序)
- train/ (此目录放处理的jpg或者其他格式的图像)
- label.txt (图像文件名及其label信息)
- 输出的leveldb文件夹的名字
- CPU/GPU (指定是在cpu上还是在gpu上运行code)
CNN网络配置文件
- Imagenet_solver.prototxt (包含全局参数的配置的文件)
- Imagenet.prototxt (包含训练网络的配置的文件)
- Imagenet_val.prototxt (包含测试网络的配置文件)
网络模型:
DATA:一般包括训练数据和测试数据层两种类型。 一般指输入层,包含source:数据路径,批处理数据大小batch_size,scale表示数据表示在[0,1],0.00390625即 1/255
训练数据层:
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
测试数据层:
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_test_lmdb"
batch_size: 100
backend: LMDB
}
}
CONVOLUATION:卷积层,blobs_lr:1 , blobs_lr:2分别表示weight 及bias更新时的学习率,这里权重的学习率为solver.prototxt文件中定义的学习率真,bias的学习率真是权重学习率的2倍,这样一般会得到很好的收敛速度。
num_output表示滤波的个数,kernelsize表示滤波的大小,stride表示步长,weight_filter表示滤波的类型
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1 //weight学习率
}
param {
lr_mult: 2 //bias学习率,一般为weight的两倍
}
convolution_param {
num_output: 20 //滤波器个数
kernel_size: 5
stride: 1 //步长
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
POOLING: 池化层
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
INNER_PRODUCT: 其实表示全连接,不要被名字误导
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
RELU:激活函数,非线性变化层 max( 0 ,x ),一般与CONVOLUTION层成对出现
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
SOFTMAX:
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
参数配置文件:
***_solver.prototxt文件定义一些模型训练过程中需要到的参数,比较学习率,权重衰减系数,迭代次数,使用GPU还是CPU等等.
# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
device_id: 0 #在cmdcaffe接口下,GPU序号从0开始,如果有一个GPU,则device_id:0
训练出的模型被存为*.caffemodel,可供以后使用。
一个完整的网络应该是:
步骤
- 数据准备
准备三组数据:
- Training Set:用于训练网络
- Validation Set:用于训练时测试网络准确率
- Test Set:用于测试网络训练完成后的最终正确率
- 构建lmdb/leveldb文件,caffe支持三种数据格式输入:images, levelda, lmdb
虽然lmdb的内存消耗是leveldb的1.1倍,但是lmdb的速度比leveldb快10%至15%,更重要的是lmdb允许多种训练模型同时读取同一组数据集。
因此lmdb取代了leveldb成为Caffe默认的数据集生成格式。
- 定义name.prototxt , name_solver.prototxt文件
训练模型
在windows下训练巨麻烦,要在win下使用.sh文件才行。
在windows使用.sh
安装一波
cygwin
在软件下可以安装,如果出现package不存在的情况可以重新打开setup执行包下载,一般没问题碰到什么问题解决什么问题。
用.bat来测试
- 去官网http://yann.lecun.com/exdb/mnist/下载mnist数据集。下载后解压到C:\caffe-master\data\mnist
在caffe根目录下,新建一个create_mnist.bat,里面写入如下的脚本。此处可能出错,因为
train-images.idx3-ubyte
在解压的时候可能是train-images-idx3-ubyte
要注意修改。.\Build\x64\Release\convert_mnist_data.exe .\data\mnist\mnist_train_lmdb\train-images.idx3-ubyte .\data\mnist\mnist_train_lmdb\train-labels.idx1-ubyte .\examples\mnist\mnist_train_lmdb
echo.
.\Build\x64\Release\convert_mnist_data.exe .\data\mnist\mnist_test_lmdb\t10k-images.idx3-ubyte .\data\mnist\mnist_test_lmdb\t10k-labels.idx1-ubyte .\examples\mnist\mnist_test_lmdb
pause
`然后双击该脚本运行,即可在E:\caffe\examples\mnist下面生成相应的lmdb数据文件。
- 在caffe根目录下,新建train_mnist.bat,然后输入如下的脚本,
.\Build\x64\Release\caffe.exe train –solver=.\examples\mnist\lenet_solver.prototxt
pause
然后双击运行,就会开始训练,训练完毕后会得到相应的准确率和损失率。
、
接下来安装digits:
按照这里装就好了
https://github.com/NVIDIA/DIGITS/blob/digits-5.0/docs/BuildDigitsWindows.md
最后在digits目录下 执行python -m digits
就可以了
出现了不少bug
bug1
出现找不到pycaffe的情况,
这种情况一般是因为python没有导入caffe的包只需要将CAFFE_ROOT\Build\x64\Release\pycaffe\caffe文件夹复制到anaconda的sitepackages中就可以了。
bug2
出现pkg_resources._vendor.packaging.version.InvalidVersion: Invalid version: 'CAFFE_VERSION'
找到\DIGITS-master\digits\config下的caffe.py
按照下面的中文部分修改。
from __future__ import absolute_import
import imp
import os
import platform
import re
import subprocess
import sys
from . import option_list
from digits import device_query
from digits.utils import parse_version
def load_from_envvar(envvar):
"""
Load information from an installation indicated by an environment variable
"""
value = os.environ[envvar].strip().strip("\"' ")
#此处需要修改路径,于CAFFE_HOME对应
if platform.system() == 'Windows':
#executable_dir = os.path.join(value, 'install', 'bin')
executable_dir = os.path.join(value)
#python_dir = os.path.join(value, 'install', 'python')
python_dir = os.path.join(value, 'pycaffe')
else:
executable_dir = os.path.join(value, 'build', 'tools')
python_dir = os.path.join(value, 'python')
try:
executable = find_executable_in_dir(executable_dir)
if executable is None:
raise ValueError('Caffe executable not found at "%s"'
% executable_dir)
if not is_pycaffe_in_dir(python_dir):
raise ValueError('Pycaffe not found in "%s"'
% python_dir)
import_pycaffe(python_dir)
version, flavor = get_version_and_flavor(executable)
except:
print ('"%s" from %s does not point to a valid installation of Caffe.'
% (value, envvar))
print 'Use the envvar CAFFE_ROOT to indicate a valid installation.'
raise
return executable, version, flavor
def load_from_path():
"""
Load information from an installation on standard paths (PATH and PYTHONPATH)
"""
try:
executable = find_executable_in_dir()
if executable is None:
raise ValueError('Caffe executable not found in PATH')
if not is_pycaffe_in_dir():
raise ValueError('Pycaffe not found in PYTHONPATH')
import_pycaffe()
version, flavor = get_version_and_flavor(executable)
except:
print 'A valid Caffe installation was not found on your system.'
print 'Use the envvar CAFFE_ROOT to indicate a valid installation.'
raise
return executable, version, flavor
def find_executable_in_dir(dirname=None):
"""
Returns the path to the caffe executable at dirname
If dirname is None, search all directories in sys.path
Returns None if not found
"""
if platform.system() == 'Windows':
exe_name = 'caffe.exe'
else:
exe_name = 'caffe'
if dirname is None:
dirnames = [path.strip("\"' ") for path in os.environ['PATH'].split(os.pathsep)]
else:
dirnames = [dirname]
for dirname in dirnames:
path = os.path.join(dirname, exe_name)
if os.path.isfile(path) and os.access(path, os.X_OK):
return path
return None
def is_pycaffe_in_dir(dirname=None):
"""
Returns True if you can "import caffe" from dirname
If dirname is None, search all directories in sys.path
"""
old_path = sys.path
if dirname is not None:
sys.path = [dirname] # temporarily replace sys.path
try:
imp.find_module('caffe')
except ImportError:
return False
finally:
sys.path = old_path
return True
def import_pycaffe(dirname=None):
"""
Imports caffe
If dirname is not None, prepend it to sys.path first
"""
if dirname is not None:
sys.path.insert(0, dirname)
# Add to PYTHONPATH so that build/tools/caffe is aware of python layers there
os.environ['PYTHONPATH'] = '%s%s%s' % (
dirname, os.pathsep, os.environ.get('PYTHONPATH'))
# Suppress GLOG output for python bindings
GLOG_minloglevel = os.environ.pop('GLOG_minloglevel', None)
# Show only "ERROR" and "FATAL"
os.environ['GLOG_minloglevel'] = '2'
# for Windows environment, loading h5py before caffe solves the issue mentioned in
# https://github.com/NVIDIA/DIGITS/issues/47#issuecomment-206292824
import h5py # noqa
try:
import caffe
except ImportError:
print 'Did you forget to "make pycaffe"?'
raise
# Strange issue with protocol buffers and pickle - see issue #32
sys.path.insert(0, os.path.join(
os.path.dirname(caffe.__file__), 'proto'))
# Turn GLOG output back on for subprocess calls
if GLOG_minloglevel is None:
del os.environ['GLOG_minloglevel']
else:
os.environ['GLOG_minloglevel'] = GLOG_minloglevel
def get_version_and_flavor(executable):
"""
Returns (version, flavor)
Should be called after import_pycaffe()
"""
version_string = get_version_from_pycaffe()
if version_string is None:
version_string = get_version_from_cmdline(executable)
if version_string is None:
version_string = get_version_from_soname(executable)
if version_string is None:
raise ValueError('Could not find version information for Caffe build ' +
'at "%s". Upgrade your installation' % executable)
#这部分代码没用,但是会出现bug,我就注释了
#version = parse_version(version_string)
#if parse_version(0, 99, 0) > version > parse_version(0, 9, 0):
# flavor = 'NVIDIA'
# minimum_version = '0.11.0'
# if version < parse_version(minimum_version):
# raise ValueError(
# 'Required version "%s" is greater than "%s". Upgrade your installation.'
# % (minimum_version, version_string))
#else:
# flavor = 'BVLC'
flavor = 'BVLC'
return version_string, flavor
def get_version_from_pycaffe():
try:
from caffe import __version__ as version
return version
except ImportError:
return None
def get_version_from_cmdline(executable):
command = [executable, '-version']
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if p.wait():
print p.stderr.read().strip()
raise RuntimeError('"%s" returned error code %s' % (command, p.returncode))
pattern = 'version'
for line in p.stdout:
if pattern in line:
return line[line.find(pattern) + len(pattern) + 1:].strip()
return None
def get_version_from_soname(executable):
command = ['ldd', executable]
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if p.wait():
print p.stderr.read().strip()
raise RuntimeError('"%s" returned error code %s' % (command, p.returncode))
# Search output for caffe library
libname = 'libcaffe'
caffe_line = None
for line in p.stdout:
if libname in line:
caffe_line = line
break
if caffe_line is None:
raise ValueError('libcaffe not found in linked libraries for "%s"'
% executable)
# Read the symlink for libcaffe from ldd output
symlink = caffe_line.split()[2]
filename = os.path.basename(os.path.realpath(symlink))
# parse the version string
match = re.match(r'%s(.*)\.so\.(\S+)$' % (libname), filename)
if match:
return match.group(2)
else:
return None
#看这里,看这里,一个路径问题
#我们需要在环境变量里声明一下,CAFFE_ROOT 或者 CAFFE_HOME都可以,指向caffe编译后的 ./Build/x64/Release
if 'CAFFE_ROOT' in os.environ:
executable, version, flavor = load_from_envvar('CAFFE_ROOT')
elif 'CAFFE_HOME' in os.environ:
executable, version, flavor = load_from_envvar('CAFFE_HOME')
else:
executable, version, flavor = load_from_path()
option_list['caffe'] = {
'executable': executable,
'version': version,
'flavor': flavor,
'multi_gpu': (flavor == 'BVLC' or parse_version(version) >= parse_version(0, 12)),
'cuda_enabled': (len(device_query.get_devices()) > 0),
}
再次运行:
训练:
官方教程
完美~