pytorch 模型复现

最新推荐文章于 2024-02-24 19:59:25 发布

try-agaaain

最新推荐文章于 2024-02-24 19:59:25 发布

阅读量865

点赞数 2

文章标签： pytorch 人工智能 python 深度学习模型复现

本文链接：https://blog.csdn.net/GodNotAMen/article/details/134445554

版权

一般来说，设置相同的随机种子，在相同参数条件下，能使pytorch模型复现出相同的结果。随机种子的设置代码如下：

def get_random_seed(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

get_random_seed(42)

相关的说明可参考：PyTorch系列 | 随机种子与可复现性或 pytorch reproducibility

但当我采用这种方式时，却并不管用。从pytorch相关issue中了解到：有些算法是不确定的，如果你的网络模型中使用了这类算法，那么即使你设置了相同的随机种子也无法完全复现。虽然可以通过torch.backends.cudnn.deterministic = True选用确定性的算法，但某些算法只有不确定的实现方式。

在代码中添加torch.use_deterministic_algorithms(True)，它会强制使用确定性算法，如果使用了不确定算法，则会出现RuntimeError错误。我在添加该代码后得到如下提示：

RuntimeError: Deterministic behavior was enabled with either torch.use_deterministic_algorithms(True) or at::Context::setDeterministicAlgorithms(true), but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility

它提示我通过设置环境变量CUBLAS_WORKSPACE_CONFIG=:4096:8或CUBLAS_WORKSPACE_CONFIG=:16:8可以启用确定性操作。

于是我在代码中添加了os.environ['CUBLAS_WORKSPACE_CONFIG']=":4096:8"。随后出现了如下错误：

RuntimeError: reflection_pad2d_backward_cuda does not have a deterministic implementation, but you set ‘torch.use_deterministic_algorithms(True)’. You can turn off determinism just for this operation, or you can use the ‘warn_only=True’ option, if that’s acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

上面提到反向传播中使用的reflection_pad2d_backward_cuda 只有非确定算法实现方式。而强制使用torch.use_deterministic_algorithms(True)将会抛出错误。（可以通过torch.use_deterministic_algorithms(True, warn_only=True)将错误改为警告）

现在问题很明确了，不确定性算法无法避免（或许可以修改代码替换掉模型中的不确定性算法）。但通过上面的一系列设置，其实我们已经把算法的不确定性降到了最低，这种情况下能使复现的结果比较接近，不会偏差太大。

为验证固定随机种子的效果，我做了三组实验，条件分别为①固定随机种子，尽可能选用随机参数（设置torch.backends.cudnn.deterministic = True）；②只固定随机种子；③不进行限定。
在相同实验条件下进行100次实验，每次实验中网络进行10次反向传播，得到输出结果（选取输出张量的第一个值）如下：

Fix seed & deterministic	Fix seed	No restrictions
0.315594494	0.312797427	0.500693619
0.30812043	0.313226044	0.451529086
0.314348996	0.31047523	-0.18644166
0.310744405	0.31160301	-0.123521239
…	…	…
0.314643502	0.313079745	0.147185132
0.313363969	0.313408256	-0.154002443