使用bert时遇到的一个bug,是因为我没有把向量从cpu类型转为gpu类型,参考issue 227
问题:
Here is the complete error message:
Traceback (most recent call last):
File "app/set_expantion_eval.py", line 118, in <module>
map_n=flags.map_n)
File "app/set_expantion_eval.py", line 62, in Eval
expansionWithScores = BE.set_expansion_tensorized(seeds, ["1"])
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/app/bert_expansion.py", line 109, in set_expansion_tensorized
gold_repr_list.append(self.extract_representation(" ".join(seed), x, dim))
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/app/bert_expansion.py", line 317, in extract_representation
output_all_encoded_layers=output_all_encoded_layers)
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/env3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/env3.6/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 626, in forward
embedding_output = self.embeddings(input_ids, token_type_ids)
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/env3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/env3.6/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 193, in forward
words_embeddings = self.word_embeddings(input_ids)
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/env3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/env3.6/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 118, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/mnt/castor/seas_home/d/danielkh/ideaProjects/bert-analogies/env3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'
Here is a summary of what I do in my code:
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model = BertModel.from_pretrained(bert_model) # loading the model
self.model.to(self.device) # without this there is no error, but it runs in CPU (instead of GPU).
self.model.eval() # declaring to the system that we're only doing 'forward' calculations
# creating the input tensors here
...
# move the tensors to the target device
input_ids_tensor.to(self.device)
segment_ids_tensor.to(self.device)
input_mask_tensor.to(self.device)
output_all_encoded_layers.to(self.device)
encoded_layers, _ = self.model(input_ids_tensor, segment_ids_tensor, input_mask_tensor, output_all_encoded_layers=output_all_encoded_layers)
- When I don’t have
model.to(device)
the code works fine, but I think it only uses CPU only. When I add it, it fails with the above error.
I did a little investigation and printed the inputs to .model(.) to see if they are properly copied to device:
print("\n * input_ids_tensor \n ")
print(input_ids_tensor)
print(input_ids_tensor.device)
print("\n * segment_ids_tensor \n ")
print(segment_ids_tensor)
print(segment_ids_tensor.device)
print("\n * input_mask_tensor \n ")
print(input_mask_tensor)
print(input_mask_tensor.device)
print("\n * self.device \n ")
print(self.device)
which outputs:
* input_ids_tensor
tensor([[ 101, 5334, 2148, 1035, 3792, 3146, 102, 5334, 102, 0, 0],
[ 101, 5334, 2148, 1035, 3792, 3146, 102, 2148, 1035, 3792, 102],
[ 101, 5334, 2148, 1035, 3792, 3146, 102, 3146, 102, 0, 0]])
cpu
* segment_ids_tensor
tensor([[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]])
cpu
* input_mask_tensor
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])
cpu
* self.device
cuda:0
As it can be seen, the tensors are still cpu, even after running .to(device).
Any thoughts where things are going wrong?
解决办法:
You failed to move the tensors to GPU.
Replace your code with this:
input_ids_tensor = input_ids_tensor.to(self.device)
segment_ids_tensor = segment_ids_tensor.to(self.device)
input_mask_tensor = input_mask_tensor.to(self.device)
原因:
didn’t realize they don’t work in-place (unlike the syntax for model files model.to(device)
).