1. train_parameters
# build model
model = builder.build_classifier(cfg.model)
logger.info(model)
if global_rank == 0:
for param in model.train_parameters(): # 第0层: list; 第一层: 字典
logger.info(param.keys()) # train_parameters里面存储的是list
print("*******************************")
for name, value in param.items():
print(name, "\t", value)
print("-------------------------------")`
在backbone、head以及classifier层组成的model之后,build一下model,注册生成nn.Model类的实例 model;
其中各部分的参数,都在各自定义的 train_parameters 里面,定义时格式如下:
1.1 backbone
在backbone中,train_parameters返回值为list类型,其中list的元素为dict类型,dict形式为: {“params”: [ ] },键值的类型是list,其内容是每个计算层的参数,依次添加进来:
def train_parameters(self):
params = [] # 定义参数存储 list params []
for name, module in self.named_modules():
if isinstance(module, nn.Conv2d): # 按照顺序,将不同计算层参数加到list params[]中
params.append(module.weight)
if module.bias is not None:
params.append(module.bias)
elif isinstance(module, nn.Linear):
params.append(module.weight)
if module.bias is not None:
params.append(module.bias)
elif isinstance(module, nn.BatchNorm2d):
params.append(module.weight)
if module.bias is not None:
params.append(module.bias)
elif isinstance(module, nn.PReLU):
params.append(module.weight)
params = [ # 整个train_parameters返回的是一个 list,list中的元素是dict
dict(params=params) # 定义参数字典,键"params"对应键值 params(上面得到的list)
]
return params
1.2 head
以fc.py为例,此中只定义了一个FC层和BN层,直接将此两层的参数加进来就好,返回值仍然是 list类型,list的每个元素都是一个dict,dict形式为。{“params”: [ ] },键值即 每层的参数以list形式排列:
此处返回的 params=[ {“params”: [self.fc.weight, self.fc.bias, self.bn.weight, self.bn.bias] } ]
def train_parameters(self):
params = [
dict(params=[self.fc.weight, self.fc.bias]),
]
if self.with_bn:
params += [
dict(params=[self.bn.weight, self.bn.bias])
]
return params
1.3 classifier
在classifier中的base设置如下,主要是引入了 backbone.train_parameters() 和 feature.train_parameters(),即 bcakbone和head两层的参数:
返回的 params=[ {“params”: [一组] }, {“params”: [二组] } ]
def train_parameters(self):
params = self.backbone.train_parameters()
if self.feature is not None:
params += self.feature.train_parameters()
return params
在具体的分类层 .py中定义如下,外加了这一层的参数list为:
params=[ {“params”: [self.weight], “lr_mult” : 10 } ]
最终返回值为: params=[ {“params”: [一组] }, {“params”: [二组] }, {“params”: [三组], “lr_mult” : 10 } ]
def train_parameters(self):
params = super(CombinedClassifier, self).train_parameters()
params += [dict(params=[self.weight], lr_mult=10)]
return params
1.4 整个model的train_parameter
if global_rank == 0:
for param in model.train_parameters(): # 第0层: list; 第一层: 字典
logger.info(param.keys()) # train_parameters里面存储的是list
print("*******************************")
for name, value in param.items():
print(name, "\t", type(value))
if "params"==name: # 每组的参数个数(不同层的weight,bias.....)
print(len(value))
print("-------------------------------")
输出结果:
2020-12-29 09:50:53,242 - INFO - dict_keys(['params'])
> *******************************
> params <class 'list'> 88
> -------------------------------
> 2020-12-29 09:50:53,243 - INFO - dict_keys(['params'])
> *******************************
> params <class 'list'> 4
> -------------------------------
> 2020-12-29 09:50:53,243 - INFO - dict_keys(['params', 'lr_mult'])
> *******************************
> params <class 'list'> 1 lr_mult <class 'int'>
2. optimizer
2.1 获取cfg.optimizer
从config文件获取optimizer的设置,类型为 dict:
# optimizer
optimizer = cfg.optimizer
origin_model = model
2.2 调整、统一optimizer参数
对optimizer中的参数设置,调整一致:
# optimizer
optimizer = cfg.optimizer
origin_model = model
if hasattr(origin_model, 'train_parameters'):
params = origin_model.train_parameters()
if isinstance(params, list) and isinstance(params[0], dict):
init_lr = optimizer['lr']
weight_decay = optimizer['weight_decay']
# 每一组的参数 进行相应参数调整
for idx in range(len(params)):
assert isinstance(params[idx], dict)
# 该层 若有学习率倍增参数 'lr_mult' ,直接调整学习率大小
if 'lr_mult' in params[idx]:
lr_mult = float(params[idx].pop('lr_mult'))
params[idx]['lr'] = lr_mult * init_lr
# 该层若有weight_decay倍增参数 'decay_mult' ,直接调整 weight_decay大小
if 'decay_mult' in params[idx]:
decay_mult = float(params[idx].pop('decay_mult'))
params[idx]['weight_decay'] = decay_mult * weight_decay
if not is_dist or (is_dist and is_dist == 0):
logger.info('lr & decay multipy enable')
else:
params = model.parameters()
其中, obj_from_dict方法是通过optimizer这一dict的参数设置,生成torch.optim类型的类,此处返回的是 <class ‘torch.optim.sgd.SGD’>
2.3 通过dict文件生成optimizer类
其中, obj_from_dict方法是通过optimizer这一dict的参数设置,生成torch.optim类型的类,此处返回的是 <class ‘torch.optim.sgd.SGD’>
optimizer = obj_from_dict(
optimizer, torch.optim, dict(params=params))
obj_from_dict方法定义,info:包含对象类型以及参数的dict;parent:想要生成的目标类;default_args:初始化对象的一些默认参数,dict类型,可选填。
def obj_from_dict(info, parent=None, default_args=None):
'''
Args:
info (dict): Object types and arguments.
parent (:class:`module`): Module which may containing expected object
classes.
default_args (dict, optional): Default arguments for initializing the
object.
'''
2.4 optimizer的具体内容
if 0 == global_rank:
print(optimizer)
print(type(optimizer.param_groups), len(optimizer.param_groups)) # list
print("optim.param_groups[0]: ", type(optimizer.param_groups[0]),optimizer.param_groups[0].keys()) # dict
print("optim.param_groups[1]: ", optimizer.param_groups[1].keys())
print("optim.param_groups[2]: ", optimizer.param_groups[2].keys())
print(type(optimizer.param_groups[0]['params']))
输出结果:
print(optimizer) #打印optimizer
SGD (
Parameter Group 0
dampening: 0
lr: 0.1
momentum: 0.9
nesterov: False
weight_decay: 0.0005
Parameter Group 1
dampening: 0
lr: 0.1
momentum: 0.9
nesterov: False
weight_decay: 0.0005
Parameter Group 2
dampening: 0
lr: 1.0
momentum: 0.9
nesterov: False
weight_decay: 0.0005
)
输出各个参数及内容:
print(type(optimizer.param_groups), len(optimizer.param_groups)) # list
print("optim.param_groups[0]: ", type(optimizer.param_groups[0]),optimizer.param_groups[0].keys())
print("optim.param_groups[1]: ", optimizer.param_groups[1].keys())
print("optim.param_groups[2]: ", optimizer.param_groups[2].keys())
print(type(optimizer.param_groups[0]['params']))
<class ‘list’> 3
optim.param_groups[0]: <class ‘dict’> dict_keys([‘params’, ‘lr’, ‘momentum’, ‘dampening’, ‘weight_decay’, ‘nesterov’])
optim.param_groups[1]: dict_keys([‘params’, ‘lr’, ‘momentum’, ‘dampening’, ‘weight_decay’, ‘nesterov’])
optim.param_groups[2]: dict_keys([‘params’, ‘lr’, ‘momentum’, ‘dampening’, ‘weight_decay’, ‘nesterov’]) <class ‘list’>
析:
优化器的参数 是长度为3的list:backbone、head和classifier 共计3个模块
每组参数都是一个dict,包含具体每层的参数’params’,以及 ‘lr’, ‘momentum’, ‘dampening’, ‘weight_decay’, ‘nesterov’等优化设置参数;
对于每个模块的参数’params’,又是由该模块下的每个计算层的权重 weight、偏置bias等参数构成的list。