是从发现loss为nan开始的;再回去看,发现有时候inception输出会越来越大(e+05)直到变成nan,有时候会直接变成nan。
进入InceptionTime,输出一下bottleneck特征:
X tensor([[[0.0000e+00, 8.4134e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
...,
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[2.6956e-02, 3.2876e-02, 2.3775e-02, ..., 2.7116e-02,
2.3666e-02, 2.3102e-02],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[3.5079e-02, 3.3822e-02, 1.9600e-02, ..., 2.4089e-02,
2.6073e-02, 2.7331e-02],
...,
[0.0000e+00, 3.8615e-03, 0.0000e+00, ..., 0.0000e+00,
1.3945e-03, 0.0000e+00],
[3.7243e-02, 3.1020e-02, 4.0676e-02, ..., 3.1539e-02,
3.0726e-02, 3.2928e-02],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[3.0309e-02, 2.6952e-02, 2.4554e-02, ..., 2.8092e-02,
2.7221e-02, 2.6474e-02],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[3.2792e-02, 3.2758e-02, 2.9992e-02, ..., 2.9984e-02,
3.0544e-02, 2.9964e-02],
...,
[0.0000e+00, 1.8699e-03, 0.0000e+00, ..., 1.4679e-03,
1.2632e-03, 2.1972e-03],
[3.5546e-02, 3.9769e-02, 2.6883e-02, ..., 3.3340e-02,
3.3460e-02, 3.3515e-02],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
...,
[[2.2720e-02, 2.3188e-02, 2.1630e-02, ..., 1.4369e-02,
1.1494e-02, 6.6759e-03],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[2.7578e-02, 2.3448e-02, 2.3861e-02, ..., 2.0423e-02,
1.7828e-02, 1.4557e-02],
...,
[0.0000e+00, 0.0000e+00, 2.8628e-03, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[2.9680e-02, 2.2322e-02, 2.3976e-02, ..., 1.2416e-04,
8.3968e-04, 7.1795e-03],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[0.0000e+00, 0.0000e+00, 4.8755e-03, ..., 1.9459e-02,
3.0625e-02, 1.4635e-02],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[8.4134e-01, 0.0000e+00, 1.8862e-02, ..., 1.4544e-02,
1.6730e-02, 2.1881e-02],
...,
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.0887e-02, 1.2057e-02, ..., 0.0000e+00,
1.0251e-02, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[2.7130e-02, 2.6661e-02, 2.5167e-02, ..., 2.7895e-02,
3.1589e-02, 2.9081e-02],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[2.6086e-02, 2.7430e-02, 2.5378e-02, ..., 3.3693e-02,
2.9286e-02, 3.1024e-02],
...,
[0.0000e+00, 0.0000e+00, 6.3589e-04, ..., 0.0000e+00,
0.0000e+00, 6.0121e-05],
[3.2935e-02, 2.9841e-02, 2.8617e-02, ..., 3.1719e-02,
3.1900e-02, 2.9626e-02],
[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]]], device='cuda:0', grad_fn=<GeluBackward>)
Z tensor([[[ 1.1727e-01, -6.8307e-02, -1.3049e-01, ..., 4.0559e-03,
1.0677e-01, -3.8412e-01],
[ 6.6208e-02, -7.8646e-02, -1.0040e-01, ..., 1.8195e-03,
-1.2445e-01, 5.9656e-02],
[-4.1223e-02, -1.3704e-01, 4.7077e-02, ..., 2.8726e-03,
-2.1524e-02, 1.2737e-01],
...,
[-1.2249e-01, 1.1354e-01, -4.1623e-02, ..., 5.5492e-03,
6.5915e-02, -4.2294e-01],
[ 2.3783e-02, 6.1486e-02, -1.3743e-01, ..., -5.5904e-03,
1.1536e-01, 2.3736e-01],
[-3.6811e-02, -7.9950e-02, 1.3481e-01, ..., -3.9804e-03,
8.9611e-02, -1.0136e-01]],
[[ 4.4767e-04, 8.0333e-03, 9.9244e-03, ..., 7.7955e-03,
7.8259e-03, 6.8006e-03],
[ 1.0243e-02, 5.2339e-03, 6.9350e-03, ..., 5.8020e-03,
7.0756e-03, 7.4157e-03],
[-1.5006e-02, -1.2603e-02, -1.3146e-02, ..., -1.0840e-02,
-1.0480e-02, -8.9154e-03],
...,
[ 8.4532e-04, 6.6614e-03, 5.6583e-03, ..., 5.1764e-03,
3.7338e-03, 4.9341e-03],
[-1.0710e-03, -2.4653e-03, 1.1634e-04, ..., -1.5318e-04,
9.2580e-04, -1.3004e-03],
[ 5.2839e-03, 5.3386e-03, 7.2368e-03, ..., 3.8365e-03,
4.6890e-03, 1.7439e-03]],
[[ 1.8744e-04, 8.2691e-03, 5.7251e-03, ..., 7.4143e-03,
7.8818e-03, 8.2190e-03],
[ 1.0309e-02, 7.7091e-03, 9.7910e-03, ..., 7.1014e-03,
7.2604e-03, 7.1639e-03],
[-1.4933e-02, -1.0547e-02, -8.0334e-03, ..., -1.0211e-02,
-1.0591e-02, -1.0415e-02],
...,
[ 1.3299e-03, 6.3572e-03, 3.4338e-03, ..., 4.7710e-03,
4.8170e-03, 5.0268e-03],
[-3.9221e-04, -2.9519e-03, -4.3005e-03, ..., -1.2994e-03,
-1.2756e-03, -1.4953e-03],
[ 5.6489e-03, 4.2581e-03, 1.7798e-03, ..., 3.5494e-03,
3.8125e-03, 3.9882e-03]],
...,
[[ 2.7914e-03, 8.1265e-03, 6.8793e-03, ..., 4.4897e-03,
4.8445e-03, 4.7809e-03],
[ 8.1443e-03, 7.6398e-03, 7.2464e-03, ..., 4.9623e-03,
5.5897e-03, 3.2725e-03],
[-1.4763e-02, -8.6637e-03, -9.0448e-03, ..., -6.4009e-03,
-3.1855e-03, -5.2497e-03],
...,
[ 2.4940e-03, 5.4965e-03, 3.9640e-03, ..., 3.2910e-03,
3.6254e-03, 4.2723e-03],
[-6.0344e-04, -2.8389e-03, -1.9753e-03, ..., 4.5337e-04,
-5.4313e-03, -6.3991e-04],
[ 5.9189e-03, 2.8660e-03, 4.6337e-03, ..., -2.6405e-03,
-3.7874e-03, 1.7362e-03]],
[[-1.8993e-01, -1.9882e-03, 9.2843e-03, ..., 4.7575e-03,
2.0764e-03, 6.5778e-03],
[ 1.9527e-01, 7.1894e-04, 2.4501e-03, ..., 2.9432e-03,
-3.2938e-04, 5.2494e-03],
[-2.7657e-01, 4.9967e-05, -2.3743e-03, ..., -4.7178e-03,
-7.1980e-03, -6.1970e-03],
...,
[-3.4539e-02, 9.7203e-04, 8.0449e-03, ..., 3.6396e-03,
8.4694e-03, 6.5102e-03],
[-1.5724e-01, 6.1951e-04, -5.3126e-03, ..., -3.8831e-03,
-8.6429e-04, -5.4043e-03],
[-2.8071e-02, 3.0281e-03, -3.6805e-03, ..., -2.2675e-03,
-1.7542e-03, -1.4777e-03]],
[[ 2.8666e-03, 6.6682e-03, 7.5580e-03, ..., 7.2541e-03,
7.0417e-03, 6.6543e-03],
[ 1.1153e-02, 8.4934e-03, 6.8678e-03, ..., 6.6640e-03,
6.6891e-03, 7.5735e-03],
[-1.5920e-02, -9.7284e-03, -9.5786e-03, ..., -1.0091e-02,
-1.0816e-02, -1.0166e-02],
...,
[ 1.7095e-03, 4.2530e-03, 4.4291e-03, ..., 4.1423e-03,
5.3699e-03, 4.5481e-03],
[-2.8099e-04, -1.5712e-03, -1.9634e-03, ..., -1.8063e-03,
-1.6008e-03, -1.5634e-03],
[ 6.7537e-03, 3.1014e-03, 3.7098e-03, ..., 2.8872e-03,
2.7731e-03, 2.5619e-03]]], device='cuda:0',
grad_fn=<SqueezeBackward1>)
X tensor([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0',
grad_fn=<PermuteBackward>)
Z tensor([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0',
grad_fn=<SqueezeBackward1>)
发现是输入X变为nan,应该是编码问题?
最后发现不是,应该是权重学习着学习着太大了。加了一个L2正则化loss。
model_optim = optim.Adam(self.model.parameters(), lr=self.args.learning_rate,weight_decay=0.01)
其中weight_decay=0.01就是L2正则化。
补充:
只有EthanolConcentration数据集会发生这种情况