为确保模型训练过程中,需要设置模型的随机种子,具体操作如下:
主程序
def torch_seed(seed):
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed) # Numpy module.
random.seed(seed) # Python random module.
torch.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
if __name__=="__main__":
torch_seed(369)
torch.utils.data.DataLoader
如果DataLoader中的num_workers不为0
,则需要设置worker_init_fn,确保每个worker的采样可控。
def worker_init(worker_init):
seed = 369
np.random.seed(int(seed)+worker_init)
DataLoader(db, batch_size=bz, worker_init_fn=worker_init, shuffle=True, num_workers=num_workers, pin_memory=True)
参考
https://blog.csdn.net/john_bh/article/details/107731443
无法控制的因素
正常情况下,只要保持数据集,网络结构和batch_size不变,每次训练的结果就能够保持不变。but 一切皆有不可控的因素 !!!
torch的upsampling 和 interpolation
函数会带来不受随机种子的因素,因此有这两个函数的话即使设置了随机种子结果仍然无法控制。
自己跑了一个模型结果却是也是由于有interpolation一直无法出现复现相同结构,但是设置了随机种子后,相同参数下的误差范围可以得到一定缓解
参考
Pytorch modules follow non-deterministic behavior; and most of the time you can not get rid of it by following the above steps. Example: the BACKWARD of upsampling and interpolation functionals/classes is non-deterministic (see here). This means, if you use such modules in the training graph, you will never obtain a deterministic results no matter what you do. torch.nn.ConvTranspose2d is not deterministic unless you set torch.backends.cudnn.deterministic = True (they said you can try to make the operation deterministic … by setting torch.backends.cudnn.deterministic = True.
https://github.com/pytorch/pytorch/issues/7068