师弟有一个疑惑,他在项目的过程中不明白LayerSetUp和Reshape究竟是如何被调用的,带着这个疑问,我翻看了源码,下面就给大家讲讲调用的流程。
实际上本次理清楚的问题主要围绕sgdsolver类是如何被注册和被实例化的,此外还有就是每个网络层的LayerSetUp和Reshape是如何被执行的,还有就是solver是如何完成前传和反传的。
一、Solver类是如何被注册以及被调用的
首先介绍一下SGDSolver求解器的定义(读者可以翻看caffe源码中对应的cpp和hpp文件,include\caffe和src\caffe中可以找到)。
首先定义一个solver,这里假设以sgd_solver为例
注意这两个宏
INSTANTIATE_CLASS(SGDSolver);
REGISTER_SOLVER_CLASS(SGD);
这两个宏在common.hpp中定义
第一个宏INSTANTIATE_CLASS(SGDSolver);
该宏的名字翻译过来就是初始化类的意思,填入的参数是某个类的名字,这里是SGDSolver,因此是初始化SGDSolver类的
该宏的作用就是定义一个char变量,该变量名gInstantiationGuardSGDSolver
以及两个类的声明?(我不是很明白,为什么double类型的类没有分号。)
template class SGDSolver<float>;
template class SGDSolver<double>
实际上是初始化了两个类,分别为 float和double类型的
// Instantiate a class with float and double specifications.
#define INSTANTIATE_CLASS(classname) \
char gInstantiationGuard##classname; \
template class classname<float>; \
template class classname<double>
第二个宏REGISTER_SOLVER_CLASS(SGD); 实际上是下面的宏
#define REGISTER_SOLVER_CREATOR(type, creator) \
static SolverRegisterer<float> g_creator_f_##type(#type, creator<float>); \
static SolverRegisterer<double> g_creator_d_##type(#type, creator<double>) \
#define REGISTER_SOLVER_CLASS(type) \
template <typename Dtype> \
Solver<Dtype>* Creator_##type##Solver( \
const SolverParameter& param) \
{ \
return new type##Solver<Dtype>(param); \
} \
REGISTER_SOLVER_CREATOR(type, Creator_##type##Solver)
接下来解释REGISTER_SOLVER_CREATOR(type, creator)宏的含义:
由于输入该宏的参数是SGD,所以该宏定义了两个SolverRegisterer类的实例分别为
static SolverRegisterer<float> g_creator_f_SGD(SGD, creator<float>);
static SolverRegisterer<double> g_creator_d_SGD(SGD, creator<double>);
接下来解释#define REGISTER_SOLVER_CLASS(type)宏的含义:
该宏实际上就是定义了一个Creator_##type##Solver的函数,该函数就是new一个type##Solver<Dtype>的对象
然后再将该对象传入到在宏REGISTER_SOLVER_CREATOR(type, creator)中所定义的SolverRegisterer<float>或者SolverRegisterer<double>的实例。
具体而言
假如输入的type是SGD,那么就得到如下展开的宏
template <typename Dtype>
Solver<Dtype>* Creator_SGDSolver(
const SolverParameter& param)
{
return new SGDSolver<Dtype>(param);
}
实际上定义了一个创建SGDSolver的类的实例。
再在其下面执行宏REGISTER_SOLVER_CREATOR(SGD, Creator_SGDSolver)
那么展开就是产生了SolverRegisterer的两个类的静态实例,分别为g_creator_f_SGD和g_creator_d_SGD。
static SolverRegisterer<float> g_creator_f_SGD(SGD, Creator_SGDSolver<float>);
static SolverRegisterer<double> g_creator_f_SGD(SGD, Creator_SGDSolver<double>);
那么上面既然是直接产生了SolverRegisterer类的实例了,并且传递给该构造函数的是SGD和Creator_SGDSolver<float/double>,那么就来看看其构造函数:
template <typename Dtype>
class SolverRegisterer {
public:
SolverRegisterer(const string& type,
Solver<Dtype>* (*creator)(const SolverParameter&)) {
// LOG(INFO) << "Registering solver type: " << type;
SolverRegistry<Dtype>::AddCreator(type, creator);
}
};
看到没,该类唯一的作用就是调用SolverRegistry里面的静态函数AddCreator(type, creator)
把参数带入进去就是AddCreator(SGD, Creator_SGDSolver<float/double>)
那我们再去看看SolverRegistry类
class SolverRegistry {
public:
// 定义了一个Creator函数指针类型
typedef Solver<Dtype>* (*Creator)(const SolverParameter&);
// 定义了一个map容器,该容器的键是string,而值是Creator函数指针类型
typedef std::map<string, Creator> CreatorRegistry;
// 注意该函数,该函数是静态函数,并且该函数内部的g_registry_是静态变量
// 注意静态函数中的静态变量的行为
// 静态函数是在程序运行的时候被执行
// 而静态变量则是在静态函数第一次执行的时候被初始化,其余时间都不会被初始化。
static CreatorRegistry& Registry() {
static CreatorRegistry* g_registry_ = new CreatorRegistry();
return *g_registry_;
}
// 这里是将Creator函数指针类型加入到map容器中进行保存,以便后期调用CreateSolver的时候根据给定的
// 字符串获得对应的Solver的Creator函数指针并执行
// Adds a creator.
static void AddCreator(const string& type, Creator creator) {
CreatorRegistry& registry = Registry();
CHECK_EQ(registry.count(type), 0)
<< "Solver type " << type << " already registered.";
registry[type] = creator;
}
// 执行一个Creator函数指针所指向的函数
// Get a solver using a SolverParameter.
static Solver<Dtype>* CreateSolver(const SolverParameter& param) {
const string& type = param.type();
CreatorRegistry& registry = Registry();
CHECK_EQ(registry.count(type), 1) << "Unknown solver type: " << type
<< " (known types: " << SolverTypeListString() << ")";
return registry[type](param);
}
怎么样,看到AddCreator函数,就晓得怎么回事了吧,直接把对应的SGD作为key,而把对应的函数指针作为value存放在了静态的map结构中。
以上就解释了某个Solver类如何被注册的了。
下面讲解下solver是如何被调用的
(1)caffe.cpp中的函数int train()
(2)train函数中的
shared_ptr<caffe::Solver<float> >
solver(caffe::SolverRegistry<float>::CreateSolver(solver_param));
(3)而CreateSolver函数则是调用registry[type](param);
而registry是一个静态的map容器而registry[type](param)实际上调用了SGDSolver的构造函数
new SGDSolver<Dtype>(param);
实际上这里就完成了solver类的实例的创建,而后续的操作,即利用solver开始训练可以看完整的train函数
// Train / Finetune a model.
int train() {
CHECK_GT(FLAGS_solver.size(), 0) << "Need a solver definition to train.";
CHECK(!FLAGS_snapshot.size() || !FLAGS_weights.size())
<< "Give a snapshot to resume training or weights to finetune "
"but not both.";
vector<string> stages = get_stages_from_flags();
caffe::SolverParameter solver_param;
caffe::ReadSolverParamsFromTextFileOrDie(FLAGS_solver, &solver_param);
solver_param.mutable_train_state()->set_level(FLAGS_level);
for (int i = 0; i < stages.size(); i++) {
solver_param.mutable_train_state()->add_stage(stages[i]);
}
// If the gpus flag is not provided, allow the mode and device to be set
// in the solver prototxt.
if (FLAGS_gpu.size() == 0
&& solver_param.solver_mode() == caffe::SolverParameter_SolverMode_GPU) {
if (solver_param.has_device_id()) {
FLAGS_gpu = "" +
boost::lexical_cast<string>(solver_param.device_id());
} else { // Set default GPU if unspecified
FLAGS_gpu = "" + boost::lexical_cast<string>(0);
}
}
vector<int> gpus;
get_gpus(&gpus);
if (gpus.size() == 0) {
LOG(INFO) << "Use CPU.";
Caffe::set_mode(Caffe::CPU);
} else {
ostringstream s;
for (int i = 0; i < gpus.size(); ++i) {
s << (i ? ", " : "") << gpus[i];
}
LOG(INFO) << "Using GPUs " << s.str();
#ifndef CPU_ONLY
cudaDeviceProp device_prop;
for (int i = 0; i < gpus.size(); ++i) {
cudaGetDeviceProperties(&device_prop, gpus[i]);
LOG(INFO) << "GPU " << gpus[i] << ": " << device_prop.name;
}
#endif
solver_param.set_device_id(gpus[0]);
Caffe::SetDevice(gpus[0]);
Caffe::set_mode(Caffe::GPU);
Caffe::set_solver_count(gpus.size());
}
// 获取ctrl+c的处理函数
caffe::SignalHandler signal_handler(
GetRequestedAction(FLAGS_sigint_effect),
GetRequestedAction(FLAGS_sighup_effect));
// 根据solver.prototxt中指定的solver的类型产生对应的Solver类的实例
shared_ptr<caffe::Solver<float> >
solver(caffe::SolverRegistry<float>::CreateSolver(solver_param));
// 设置处理函数
solver->SetActionFunction(signal_handler.GetActionFunction());
if (FLAGS_snapshot.size()) {
LOG(INFO) << "Resuming from " << FLAGS_snapshot;
solver->Restore(FLAGS_snapshot.c_str());
} else if (FLAGS_weights.size()) {
CopyLayers(solver.get(), FLAGS_weights);
}
if (gpus.size() > 1) {
caffe::P2PSync<float> sync(solver, NULL, solver->param());
sync.Run(gpus);
} else {
LOG(INFO) << "Starting Optimization";
solver->Solve();
}
LOG(INFO) << "Optimization Done.";
return 0;
}
二、LayerSetUp和Reshape是如何被调用的
下面就顺带解释LayerSetUp和Reshape究竟是如何被调用的
(4)而SGDSolver类是继承于Solver类的,因此也执行了Solver类的构造函数
而Solver类的构造函数执行了Init
template <typename Dtype>
Solver<Dtype>::Solver(const string& param_file, const Solver* root_solver)
: net_(), callbacks_(), root_solver_(root_solver),
requested_early_exit_(false) {
SolverParameter param;
ReadSolverParamsFromTextFileOrDie(param_file, ¶m);
Init(param);
}
从Init函数中可以看出它执行了InitTrainNet
template <typename Dtype>
void Solver<Dtype>::Init(const SolverParameter& param) {
CHECK(Caffe::root_solver() || root_solver_)
<< "root_solver_ needs to be set for all non-root solvers";
LOG_IF(INFO, Caffe::root_solver()) << "Initializing solver from parameters: "
<< std::endl << param.DebugString();
param_ = param;
CHECK_GE(param_.average_loss(), 1) << "average_loss should be non-negative.";
CheckSnapshotWritePermissions();
if (Caffe::root_solver() && param_.random_seed() >= 0) {
Caffe::set_random_seed(param_.random_seed());
}
// Scaffolding code
InitTrainNet();
if (Caffe::root_solver()) {
InitTestNets();
LOG(INFO) << "Solver scaffolding done.";
}
iter_ = 0;
current_step_ = 0;
}
从InitTrainNet函数中可以看出它执行了Net类的构造函数
template <typename Dtype>
void Solver<Dtype>::InitTrainNet() {
const int num_train_nets = param_.has_net() + param_.has_net_param() +
param_.has_train_net() + param_.has_train_net_param();
const string& field_names = "net, net_param, train_net, train_net_param";
CHECK_GE(num_train_nets, 1) << "SolverParameter must specify a train net "
<< "using one of these fields: " << field_names;
CHECK_LE(num_train_nets, 1) << "SolverParameter must not contain more than "
<< "one of these fields specifying a train_net: " << field_names;
NetParameter net_param;
if (param_.has_train_net_param()) {
LOG_IF(INFO, Caffe::root_solver())
<< "Creating training net specified in train_net_param.";
net_param.CopyFrom(param_.train_net_param());
} else if (param_.has_train_net()) {
LOG_IF(INFO, Caffe::root_solver())
<< "Creating training net from train_net file: " << param_.train_net();
ReadNetParamsFromTextFileOrDie(param_.train_net(), &net_param);
}
if (param_.has_net_param()) {
LOG_IF(INFO, Caffe::root_solver())
<< "Creating training net specified in net_param.";
net_param.CopyFrom(param_.net_param());
}
if (param_.has_net()) {
LOG_IF(INFO, Caffe::root_solver())
<< "Creating training net from net file: " << param_.net();
ReadNetParamsFromTextFileOrDie(param_.net(), &net_param);
}
// Set the correct NetState. We start with the solver defaults (lowest
// precedence); then, merge in any NetState specified by the net_param itself;
// finally, merge in any NetState specified by the train_state (highest
// precedence).
NetState net_state;
net_state.set_phase(TRAIN);
net_state.MergeFrom(net_param.state());
net_state.MergeFrom(param_.train_state());
net_param.mutable_state()->CopyFrom(net_state);
if (Caffe::root_solver()) {
net_.reset(new Net<Dtype>(net_param));
} else {
net_.reset(new Net<Dtype>(net_param, root_solver_->net_.get()));
}
}
而net类的构造函数则执行了Net自己的init
template <typename Dtype>
Net<Dtype>::Net(const NetParameter& param, const Net* root_net)
: root_net_(root_net) {
Init(param);
}
template <typename Dtype>
Net<Dtype>::Net(const string& param_file, Phase phase, const Net* root_net)
: root_net_(root_net) {
NetParameter param;
ReadNetParamsFromTextFileOrDie(param_file, ¶m);
param.mutable_state()->set_phase(phase);
Init(param);
}
而net自己的init函数中执行了每一个层的SetUp函数
template <typename Dtype>
void Net<Dtype>::Init(const NetParameter& in_param) {
CHECK(Caffe::root_solver() || root_net_)
<< "root_net_ needs to be set for all non-root solvers";
// Set phase from the state.
phase_ = in_param.state().phase();
// Filter layers based on their include/exclude rules and
// the current NetState.
NetParameter filtered_param;
FilterNet(in_param, &filtered_param);
LOG_IF(INFO, Caffe::root_solver())
<< "Initializing net from parameters: " << std::endl
<< filtered_param.DebugString();
// Create a copy of filtered_param with splits added where necessary.
NetParameter param;
InsertSplits(filtered_param, ¶m);
// Basically, build all the layers and set up their connections.
name_ = param.name();
map<string, int> blob_name_to_idx;
set<string> available_blobs;
CHECK(param.input_dim_size() == 0 || param.input_shape_size() == 0)
<< "Must specify either input_shape OR deprecated input_dim, not both.";
if (param.input_dim_size() > 0) {
// Deprecated 4D dimensions.
CHECK_EQ(param.input_size() * 4, param.input_dim_size())
<< "Incorrect input blob dimension specifications.";
} else {
CHECK_EQ(param.input_size(), param.input_shape_size())
<< "Exactly one input_shape must be specified per input.";
}
memory_used_ = 0;
// set the input blobs
for (int input_id = 0; input_id < param.input_size(); ++input_id) {
const int layer_id = -1; // inputs have fake layer ID -1
AppendTop(param, layer_id, input_id, &available_blobs, &blob_name_to_idx);
}
// For each layer, set up its input and output
bottom_vecs_.resize(param.layer_size());
top_vecs_.resize(param.layer_size());
bottom_id_vecs_.resize(param.layer_size());
param_id_vecs_.resize(param.layer_size());
top_id_vecs_.resize(param.layer_size());
bottom_need_backward_.resize(param.layer_size());
for (int layer_id = 0; layer_id < param.layer_size(); ++layer_id) {
// For non-root solvers, whether this layer is shared from root_net_.
bool share_from_root = !Caffe::root_solver()
&& root_net_->layers_[layer_id]->ShareInParallel();
// Inherit phase from net if unset.
if (!param.layer(layer_id).has_phase()) {
param.mutable_layer(layer_id)->set_phase(phase_);
}
// Setup layer.
const LayerParameter& layer_param = param.layer(layer_id);
if (layer_param.propagate_down_size() > 0) {
CHECK_EQ(layer_param.propagate_down_size(),
layer_param.bottom_size())
<< "propagate_down param must be specified "
<< "either 0 or bottom_size times ";
}
if (share_from_root) {
LOG(INFO) << "Sharing layer " << layer_param.name() << " from root net";
layers_.push_back(root_net_->layers_[layer_id]);
layers_[layer_id]->SetShared(true);
} else {
layers_.push_back(LayerRegistry<Dtype>::CreateLayer(layer_param));
}
layer_names_.push_back(layer_param.name());
LOG_IF(INFO, Caffe::root_solver())
<< "Creating Layer " << layer_param.name();
bool need_backward = false;
// Figure out this layer's input and output
for (int bottom_id = 0; bottom_id < layer_param.bottom_size();
++bottom_id) {
const int blob_id = AppendBottom(param, layer_id, bottom_id,
&available_blobs, &blob_name_to_idx);
// If a blob needs backward, this layer should provide it.
need_backward |= blob_need_backward_[blob_id];
}
int num_top = layer_param.top_size();
for (int top_id = 0; top_id < num_top; ++top_id) {
AppendTop(param, layer_id, top_id, &available_blobs, &blob_name_to_idx);
}
// If the layer specifies that AutoTopBlobs() -> true and the LayerParameter
// specified fewer than the required number (as specified by
// ExactNumTopBlobs() or MinTopBlobs()), allocate them here.
Layer<Dtype>* layer = layers_[layer_id].get();
if (layer->AutoTopBlobs()) {
const int needed_num_top =
std::max(layer->MinTopBlobs(), layer->ExactNumTopBlobs());
for (; num_top < needed_num_top; ++num_top) {
// Add "anonymous" top blobs -- do not modify available_blobs or
// blob_name_to_idx as we don't want these blobs to be usable as input
// to other layers.
AppendTop(param, layer_id, num_top, NULL, NULL);
}
}
// After this layer is connected, set it up.
if (share_from_root) {
// Set up size of top blobs using root_net_
const vector<Blob<Dtype>*>& base_top = root_net_->top_vecs_[layer_id];
const vector<Blob<Dtype>*>& this_top = this->top_vecs_[layer_id];
for (int top_id = 0; top_id < base_top.size(); ++top_id) {
this_top[top_id]->ReshapeLike(*base_top[top_id]);
LOG(INFO) << "Created top blob " << top_id << " (shape: "
<< this_top[top_id]->shape_string() << ") for shared layer "
<< layer_param.name();
}
} else {
layers_[layer_id]->SetUp(bottom_vecs_[layer_id], top_vecs_[layer_id]);
}
接下来解析SetUp函数中做了什么事情?
在layer.hpp中发现SetUp先执行了LayerSetUp,然后又执行了Reshape。
/**
* @brief Implements common layer setup functionality.
*
* @param bottom the preshaped input blobs
* @param top
* the allocated but unshaped output blobs, to be shaped by Reshape
*
* Checks that the number of bottom and top blobs is correct.
* Calls LayerSetUp to do special layer setup for individual layer types,
* followed by Reshape to set up sizes of top blobs and internal buffers.
* Sets up the loss weight multiplier blobs for any non-zero loss weights.
* This method may not be overridden.
*/
void SetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
InitMutex();
CheckBlobCounts(bottom, top);
LayerSetUp(bottom, top);
Reshape(bottom, top);
SetLossWeights(top);
}
而LayerSetUp可以在某个具体的网络层中实现
而Reshape也是可以在某个具体的网络层中实现
以上就完成了对于LayerSetUp和Reshape调用的解析。
三、solver是如何进行前传和反传的
前面介绍的是网络的设置和网络层中传递的blob的形状的reshape
而网络的前传和反传则是在
caffe.cpp中在train函数的末尾有一个
solver->Solve();
这实际上就是调用了Solve函数,而Solver函数则是在基类Solver中定义的
在该函数中执行了
Step(param_.max_iter() - iter_); 这样就进行了前传和反传并且不断地更新权重
t
emplate <typename Dtype>
void Solver<Dtype>::Solve(const char* resume_file) {
CHECK(Caffe::root_solver());
LOG(INFO) << "Solving " << net_->name();
LOG(INFO) << "Learning Rate Policy: " << param_.lr_policy();
// Initialize to false every time we start solving.
requested_early_exit_ = false;
if (resume_file) {
LOG(INFO) << "Restoring previous solver status from " << resume_file;
Restore(resume_file);
}
// For a network that is trained by the solver, no bottom or top vecs
// should be given, and we will just provide dummy vecs.
int start_iter = iter_;
Step(param_.max_iter() - iter_);
// If we haven't already, save a snapshot after optimization, unless
// overridden by setting snapshot_after_train := false
if (param_.snapshot_after_train()
&& (!param_.snapshot() || iter_ % param_.snapshot() != 0)) {
Snapshot();
}
if (requested_early_exit_) {
LOG(INFO) << "Optimization stopped early.";
return;
}
// After the optimization is done, run an additional train and test pass to
// display the train and test loss/outputs if appropriate (based on the
// display and test_interval settings, respectively). Unlike in the rest of
// training, for the train net we only run a forward pass as we've already
// updated the parameters "max_iter" times -- this final pass is only done to
// display the loss, which is computed in the forward pass.
if (param_.display() && iter_ % param_.display() == 0) {
int average_loss = this->param_.average_loss();
Dtype loss;
net_->ForwardPrefilled(&loss);
UpdateSmoothedLoss(loss, start_iter, average_loss);
LOG(INFO) << "Iteration " << iter_ << ", loss = " << smoothed_loss_;
}
if (param_.test_interval() && iter_ % param_.test_interval() == 0) {
TestAll();
}
LOG(INFO) << "Optimization Done.";
}
我们到step函数中看一下
template <typename Dtype>
void Solver<Dtype>::Step(int iters) {
vector<Blob<Dtype>*> bottom_vec;
const int start_iter = iter_;
const int stop_iter = iter_ + iters;
int average_loss = this->param_.average_loss();
losses_.clear();
smoothed_loss_ = 0;
while (iter_ < stop_iter) {
// zero-init the params
net_->ClearParamDiffs();
if (param_.test_interval() && iter_ % param_.test_interval() == 0
&& (iter_ > 0 || param_.test_initialization())
&& Caffe::root_solver()) {
TestAll();
if (requested_early_exit_) {
// Break out of the while loop because stop was requested while testing.
break;
}
}
for (int i = 0; i < callbacks_.size(); ++i) {
callbacks_[i]->on_start();
}
const bool display = param_.display() && iter_ % param_.display() == 0;
net_->set_debug_info(display && param_.debug_info());
// accumulate the loss and gradient
Dtype loss = 0;
//不断地迭代进行前传和反传
for (int i = 0; i < param_.iter_size(); ++i) {
// 前传+反传
loss += net_->ForwardBackward(bottom_vec);
}
loss /= param_.iter_size();
// average the loss across iterations for smoothed reporting
UpdateSmoothedLoss(loss, start_iter, average_loss);
if (display) {
LOG_IF(INFO, Caffe::root_solver()) << "Iteration " << iter_
<< ", loss = " << smoothed_loss_;
const vector<Blob<Dtype>*>& result = net_->output_blobs();
int score_index = 0;
for (int j = 0; j < result.size(); ++j) {
const Dtype* result_vec = result[j]->cpu_data();
const string& output_name =
net_->blob_names()[net_->output_blob_indices()[j]];
const Dtype loss_weight =
net_->blob_loss_weights()[net_->output_blob_indices()[j]];
for (int k = 0; k < result[j]->count(); ++k) {
ostringstream loss_msg_stream;
if (loss_weight) {
loss_msg_stream << " (* " << loss_weight
<< " = " << loss_weight * result_vec[k] << " loss)";
}
LOG_IF(INFO, Caffe::root_solver()) << " Train net output #"
<< score_index++ << ": " << output_name << " = "
<< result_vec[k] << loss_msg_stream.str();
}
}
}
for (int i = 0; i < callbacks_.size(); ++i) {
callbacks_[i]->on_gradients_ready();
}
ApplyUpdate();// 更新权重
// Increment the internal iter_ counter -- its value should always indicate
// the number of times the weights have been updated.
++iter_;
SolverAction::Enum request = GetRequestedAction();
// Save a snapshot if needed.
if ((param_.snapshot()
&& iter_ % param_.snapshot() == 0
&& Caffe::root_solver()) ||
(request == SolverAction::SNAPSHOT)) {
Snapshot();
}
if (SolverAction::STOP == request) {
requested_early_exit_ = true;
// Break out of training loop.
break;
}
}
}
以上。
转载请注明出处:http://blog.csdn.net/xizero00