Pytorch 可复现性设置随机数种子
为了实现pytorch训练神经网络可复现性需要进行以下操作:
设置随机数种子
def setup_seed(seed):
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
# os.environ['CUDA_LAUNCH_BLOCKING'] = str(1)
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.enabled = False
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.use_deterministic_algorithms(True)
其中torch.use_deterministic_algorithms(True)会强制pytorch的操作是可复现性的,如果存在非复现性操作就会报错,并指示出相关操作。
加载数据
重点在于设置shuffle=False并且num_workers=0
for (i, x_val) in enumerate(x_train):
x_val = torch.tensor(x_train[i], dtype=torch.float)
if link_load:
y_val = torch.tensor(y_train[i].flatten(), dtype=torch.float)
else:
y_val = torch.tensor(y_train[i].flatten(), dtype=torch.int64)
data = Data(x=x_val, y=y_val, edge_index=edge_index)
train_dataset.append(data)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False, sampler=sampler, num_workers=0)
图神经网络可复现性
在使用PyG写图神经网络的时候遇到了data.edge_index在进入卷积层的时候会遇到scatter_add操作是不可复现的,所以这里使用torch_sparse.SparseTensor来代替可以解决。
def forward(self, data):
x, edge_index = data.x, data.edge_index
edge_index = torch_sparse.SparseTensor(row=edge_index[0], col=edge_index[1])