这个错误遇到两次了,第一次是因为网络线性层的原因,没有注意到维度算错了。
这次又出现了,我时刻铭记线性层,看了半天网络,算了又算,维度没错啊!!!甚至这个网络训练其他数据也跑得起。然后打印的是第一个卷积层都没有打印出来,就去找第一个Block,发现传入的边特征是3维。。。。数据换了我传进来的是2维。。改了好了
果然,代码得一行一行看,不要想当然错误应该在那里然后觉得没问题
原报错信息如下:
********** start build dataloader **********
train data number:50000,eval data number:10000
********** start training
** On entry to SGEMM parameter number 10 had an illegal value
** On entry to SGEMM parameter number 10 had an illegal value
** On entry to SGEMM parameter number 10 had an illegal value
** On entry to SGEMM parameter number 10 had an illegal value
Traceback (most recent call last):
File "/home/ubuntu/lxd-workplace/ldfde/new_new/Graph/cifar10/bas1/train_me.py", line 447, in <module>
train(args)
File "/home/ubuntu/lxd-workplace/ldfde/new_new/Graph/cifar10/bas1/train_me.py", line 334, in train
outputs=model(sample_batched)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch_geometric/nn/data_parallel.py", line 70, in forward
outputs = self.parallel_apply(replicas, inputs, None)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/lxd-workplace/ldfde/new_new/Graph/cifar10/bas1/model_eq.py", line 62, in forward
x = self.block1(x,adj,edge_attr)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/lxd-workplace/ldfde/new_new/Graph/cifar10/bas1/model_eq.py", line 28, in forward
x = self.conv0(x, adj,edge_attr=edge_attr)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch_geometric/nn/conv/gat_conv.py", line 238, in forward
alpha = self.edge_updater(edge_index, alpha=alpha, edge_attr=edge_attr)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch_geometric/nn/conv/message_passing.py", line 501, in edge_updater
out = self.edge_update(**edge_kwargs)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch_geometric/nn/conv/gat_conv.py", line 269, in edge_update
edge_attr = self.lin_edge(edge_attr)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch_geometric/nn/dense/linear.py", line 136, in forward
return F.linear(x, self.weight, self.bias)
File "/home/ubuntu/miniconda3/envs/pt_xt/lib/python3.9/site-packages/torch/nn/functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
接好运,跑代码都顺利