错误信息:
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
可能的原因:
模型训练过程中常需边训练边做validation,通常使用copy.deepcopy()直接深度拷贝训练中的model用来做validation是比较简洁的写法,如在我的validation.py中,会用到:
val_model = copy.deepcopy(train_model)
但是由于copy.deepcopy()的限制,调用copy.deepcopy(model)时可能就会遇到这个错误:Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment,详细错误信息如下:
File "/home/users/xinxin.li/HAT-dev-toolchain/hat/engine/ddp_trainer.py", line 359, in _with_exception
fn(*args)
File "/home/users/xinxin.li/HAT-dev-toolchain/tools/train.py", line 186, in train_entrance
trainer.fit()
File "/home/users/xinxin.li/HAT-dev-toolchain/hat/engine/loop_base.py", line 523, in fit
storage=self.storage,
File "/home/users/xinxin.li/HAT-dev-toolchain/hat/engine/loop_base.py", line 73, in on_epoch_end
cb.on_epoch_end(**kwargs)
File "/home/users/xinxin.li/HAT-dev-toolchain/hat/callbacks/validation.py", line 207, in on_epoch_end
self._do_val(epoch_id, model, ema_model, device, val_metrics)
File "/home/users/xinxin.li/HAT-dev-toolchain/hat/callbacks/validation.py", line 163, in _do_val
val_model = self._select_and_init_val_model(train_model=eval_model)
File "/home/users/xinxin.li/HAT-dev-toolchain/hat/callbacks/validation.py", line 147, in _select_and_init_val_model
val_model = copy.deepcopy(train_model)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 306, in _reconstruct
value = deepcopy(value, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 306, in _reconstruct
value = deepcopy(value, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 161, in deepcopy
y = copier(memo)
File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/site-packages/torch/_tensor.py", line 85, in __deepcopy__
raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
如何排查:
1. 进入 /home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py ,给下面位置打断点,并输出对应的 key 和 value
2. 重新运行程序,定位报错的前一行的网络对应原模型的哪一行,找到你网络结构对应的位置,就是这个地方的报错
我的问题定位:
因为我的模型子模块在构建时返回了 self.features,导致了这个错误,我修改返回临时变量后,这个错误解决了。
修改前的代码:
def forward(self, input_image):
self.features = []
x = (input_image - 0.45) / 0.225
x = self.encoder.conv1(x)
x = self.encoder.bn1(x)
self.features.append(self.encoder.relu(x))
self.features.append(self.encoder.layer1(self.encoder.maxpool(self.features[-1])))
self.features.append(self.encoder.layer2(self.features[-1]))
self.features.append(self.encoder.layer3(self.features[-1]))
self.features.append(self.encoder.layer4(self.features[-1]))
return self.features
修改后的代码:
def forward(self, input_image):
features = []
x = (input_image - 0.45) / 0.225
x = self.encoder.conv1(x)
x = self.encoder.bn1(x)
features.append(self.encoder.relu(x))
features.append(self.encoder.layer1(self.encoder.maxpool(features[-1])))
features.append(self.encoder.layer2(features[-1]))
features.append(self.encoder.layer3(features[-1]))
features.append(self.encoder.layer4(features[-1]))
return features
参考链接:(138条消息) 解决使用copy.deepcopy()拷贝Tensor或model时报错只支持用户显式创建的Tensor问题_Arnold-FY-Chen的博客-CSDN博客_copy tensor