深度学习的一些问题及解决方案

一、DeepVelo的学习和使用

简单介绍一下遇到的报错信息以及解决方法

1. 报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

详细报错信息:

File ~/anaconda3/envs/deepvelo/lib/python3.8/site-packages/dgl/nn/pytorch/conv/graphconv.py:403, in GraphConv.forward(self, graph, feat, weight, edge_weight)

401 shp = norm.shape + (1,) * (feat_src.dim() - 1)

402 norm = th.reshape(norm, shp)

--> 403 feat_src = feat_src * norm

405 if weight is not None:

406 if self.weight is not None:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

错误原因是在402行norm构建的时候,经过了reshape自动保存到了cpu上,更改方法是在适当位置添加:

device = th.device('cuda' if th.cuda.is_available() else 'cpu')

之后402行改为:norm = th.reshape(norm, shp).to(device)

2. 报错:DGLError: Cannot assign node feature "h" on device cuda:0 to a graph on device cpu. Call DGLGraph.to() to copy the graph to the same device.

具体报错信息:

File ~/anaconda3/envs/deepvelo/lib/python3.8/site-packages/dgl/nn/pytorch/conv/graphconv.py:423, in GraphConv.forward(self, graph, feat, weight, edge_weight)

421 print(feat_src.is_cuda)

422 #print(graph.is_cuda)

--> 423 graph.srcdata['h'] = feat_src

424 graph.update_all(aggregate_fn, fn.sum(msg='m', out='h'))

425 rst = graph.dstdata['h']

File ~/anaconda3/envs/deepvelo/lib/python3.8/site-packages/dgl/view.py:81, in HeteroNodeDataView.__setitem__(self, key, val)

77 else:

78 assert isinstance(val, dict) is False, \

79 'The HeteroNodeDataView has only one node type. ' \

80 'please pass a tensor directly'

---> 81 self._graph._set_n_repr(self._ntid, self._nodes, {key : val})

File ~/anaconda3/envs/deepvelo/lib/python3.8/site-packages/dgl/heterograph.py:3995, in DGLHeteroGraph._set_n_repr(self, ntid, u, data)

3992 raise DGLError('Expect number of features to match number of nodes (len(u)).'

3993 ' Got %d and %d instead.' % (nfeats, num_nodes))

3994 if F.context(val) != self.device:

-> 3995 raise DGLError('Cannot assign node feature "{}" on device {} to a graph on'

3996 ' device {}. Call DGLGraph.to() to copy the graph to the'

3997 ' same device.'.format(key, F.context(val), self.device))

3999 if is_all(u):

4000 self._node_frames[ntid].update(data)

DGLError: Cannot assign node feature "h" on device cuda:0 to a graph on device cpu. Call DGLGraph.to() to copy the graph to the same device.

其实这个错误跟刚刚第一个是出现在同一个文件里的,问题有相似之处,都是参与运算的不同变量有的在cpu上,有的在gpu上,这次是graph对象的问题,在文件的适当位置添加:

graph=graph.to(device)

即可。

3. 杀死GPU中的特定进程

首先展示出进程编号:nvidia-smi

之后选择需要杀死的进程PID,使用:kill -9 {PID}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值