在torch实现DQN的代码中,有一个困扰我很久的问题,即在抽样学习时会报维度不匹配。
sample_index = np.random.choice(self.memory_size, self.batch_size)
b_memory = self.memory[sample_index, :]
b_s = torch.FloatTensor(b_memory[:, :self.s_dim])
b_a = torch.LongTensor(b_memory[:, self.s_dim].astype(int))
b_r = torch.FloatTensor(b_memory[:, self.s_dim + 1])
b_s_ = torch.FloatTensor(b_memory[:, -self.s_dim:])
q_eval = self.eval_net(b_s).gather(1, b_a) # 此处报错
q_next = self.target_net(b_s_).detach()
q_target = b_r + self.gamma * q_next.max(1)[0].view(self.batch_size, 1)
loss = self.loss_func(q_eval, q_target)
报错为:
q_eval = self.eval_net(b_s).gather(1, b_a)
RuntimeError: Index tensor must have the same number of dimensions as input tensor
经过一些尝试后,将第4行处改成了:
b_a = torch.LongTensor(b_memory[:, state_size].astype(int)).view(self.batch_size, 1)
此时变成了warning,但至少是能跑起来了:
D:\Python3.7\lib\site-packages\torch\nn\modules\loss.py:530: UserWarning: Using a target size (torch.Size([32, 32])) that is different to the input size (torch.Size([32, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
今天重新对比了网上的代码,发现问题居然出在数组切片上。。。下面举个栗子:
>>> import numpy as np
>>> a = np.array([[_ for _ in range(10)] for _ in range(10)])
>>> a[:, 4]
array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
>>> a[:, 4:5]
array([[4],
[4],
[4],
[4],
[4],
[4],
[4],
[4],
[4],
[4]])
果然基础不牢地动山摇啊嘤!当时想当然地觉得这俩是一样的,为了代码短一点好看 就自己手贱删了,没想到造成了代码跑不对的大问题。