好久没自己从头搭建模型,而且hyperopt在torch的应用示例较少,所以记录一些debug遇到的问题点
——————
1、hyperopt无法正常生成搜索空间
翻看库的源代码,fmin导入的参数函数fn只能由搜索空间作为首个传入参数,最好是唯一的参数
fn : callable (trial point -> loss)
This function will be called with a value generated from `space`
as the first and possibly only argument. It can return either
a scalar-valued loss, or a dictionary. A returned dictionary must
contain a 'status' key with a value from `STATUS_STRINGS`, must
contain a 'loss' key if the status is `STATUS_OK`. Particular
optimization algorithms may look for other keys as well. An
optional sub-dictionary associated with an 'attachments' key will
be removed by fmin its contents will be available via
`trials.trial_attachments`. The rest (usually all) of the returned
dictionary will be stored and available later as some 'result'
sub-dictionary within `trials.trials`.
2、输入数据维度
模型的input_dim和output_dim对应的是数据集的特征量。
3、损失函数输出为NAN
数据集存在空值,反向传播时导致损失计算为NAN,需要对缺失值进行处理。
解决方法举例:df=df.dropna()
另:问题可以通过函数进行检查:with torch.autograd.detect_anomaly():
又另:虽然正常跑通但是损失值看着还是有点高,可能之后需要考虑一下normalization
4、outputs和labels维度不对应
报错提示:类似于:UserWarning: Using a target size (torch.Size([64,1])) that is different to the input size (torch.Size([64,])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
参考:https://blog.csdn.net/xll_bit/article/details/123906121
解决方法:在forward函数返回输出前,对outputs([64,1])用torch.squeeze()
与labels的维度([64,])进行降维对齐形状,否则torch采用自动broadcast会影响loss计算。(如果要升维,用torch.unsqueeze
)
5、hp.choice
hp.choice返回值为选择范围列表的索引;
但是看示例都可以直接用params['label']
在模型中引入参数,实践发现在模型中应用时产生了可选列表不存在的值(0、1等)
报错提示:
{'activation': 0, 'batch_size': 0, 'layers': 1, 'learning_rate': 2, 'regularization_rate': 2, 'units1': 2, 'units2': 0, 'units3': 1, 'units4': 7}
/.../torch/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
搜索空间:
space={'units1':hp.choice('units1',[4,8,16,32,64,128,256,512]),
'units2':hp.choice('units2',[4,8,16,32,64,128,256,512]),
'units3':hp.choice('units3',[4,8,16,32,64,128,256,512]),
'units4':hp.choice('units4',[4,8,16,32,64,128,256,512]),
'layers':hp.choice('layers',[2,3,4]),
'batch_size':hp.choice('batch_size',[32,64,128,256,500]),
'learning_rate':hp.choice('learning_rate',[0.001,0.01,0.1]),
'regularization_rate':hp.choice('regularization_rate',[0,0.001,0.01,0.1]), # 优化器为了正则化设置的weight_decay
'activation':hp.choice('activation',['relu','softplus'])
}
参考:https://docs.azure.cn/zh-cn/databricks/machine-learning/automl-hyperparam-tuning/hyperopt-best-practices
语句是训练结束以下标形式返回最优参数,可以在fmin函数里设置return_argmin=False
返回数值
6、early_stop_fn
发现没有跑完设定的epoch就中止了,一番排查以后发现是参考的代码设置了early_stop_fn。
参考:https://zhuanlan.zhihu.com/p/629690012
from hyperopt.early_stop import no_progress_loss
early_stop_fn用于提前停止参数,一般从hyperopt库导入的方法no_progress_loss(),这个方法中可以输入具体的数字n,表示当损失连续n次没有下降时,让算法提前停止。由于贝叶斯方法的随机性较高,当样本量不足时需要多次迭代才能够找到最优解。