依赖项:
horovod
Horovod是Uber开源的又一个深度学习工具,它的发展吸取了Facebook "Training ImageNet In 1 Hour" 与百度 "Ring Allreduce" 的优点
pip install horovod --no-cache-dir
不依赖的horovod:
https://github.com/bleakie/MaskInsightface
数据文件:train.rec
训练入口:
recognition/partial_fc/mxnet/train_memory.py
配置文件,dataset路径也在这里配置:
recognition/partial_fc/mxnet/default.py
网络,数据参数:
def parse_args():
parser = argparse.ArgumentParser(description='Train parall face network')
# general
parser.add_argument('--dataset', default='emore', help='dataset config')
parser.add_argument('--network', default='r100', help='network config')
parser.add_argument('--loss', default='cosface', help='loss config')
获取网络:
embedding = eval(config.net_name).get_symbol()
函数:
def get_symbol_embedding():
embedding = eval(config.net_name).get_symbol()
all_label = mx.symbol.Variable('softmax_label')
all_label = mx.symbol.BlockGrad(all_label)
out_list = [embedding, all_label]
out = mx.symbol.Group(out_list)
return out, embedding
加载预训练:
recognition/partial_fc/mxnet/memory_module.py
sym, arg_params, aux_params =mx.model.load_checkpoint(r"model", 0)
def fit(self,
train_data,
optimizer_params,
batch_end_callback,
initializer,
arg_params=None,
aux_params=None):
# Bind -> Init_params -> Init_optimizers
self.bind(train_data.provide_data, train_data.provide_label, True)
self.init_params(initializer, arg_params, aux_params, False)
self.init_optimizer(optimizer_params=optimizer_params)
# Sync init
_arg_params, _aux_params = self.backbone_module.get_params()
_arg_params_rank_0 = self.broadcast_parameters(_arg_params)
_aux_params_rank_0 = self.broadcast_parameters(_aux_params)
self.backbone_module.set_params(_arg_params_rank_0, _aux_params_rank_0)