最近遇到一个问题,同样的一个网络,稍微改了一丢丢地方,居然会导致结果差异天差地别,看loss函数可以大概知道model1有问题。但是其实也可以从另一个角度,比如参数的分布来看,这里就介绍一下我的做法。(pytorch 用tensorboardX, tensorflow用tensorboard)
一共四个小实验
实验1,对比model 1和model 2的参数histogram
示例代码:
服务器上训练好的模型保存在“models_2_1”和“model_2_2”下面
服务器上代码:
#debug.py
import torch, os
from MyNet
from tensorboardX import SummaryWriter
import numpy as np
epochs = 10
train_parts = '1_3'
test_part = '2'
save_dir1 = './models_'+test_part+'_1'
save_dir2 = './models_'+test_part+'_2'
epochs = 10
cuda_num = 0
if not os.path.exists(save_dir1):
os.mkdir(save_dir1)
if not os.path.exists(save_dir2):
os.mkdir(save_dir2)
net = MyNet()
if torch.cuda.is_available():
net.cuda(0)
writer = SummaryWriter()
for epoch in range(9,epochs):
model_path = save_dir1 + '/combine_' + train_parts + '_params_epoch_' + str(epoch) + '.pkl'
net.load_state_dict(torch.load(model_path))
net.eval()
for name, param in net.named_parameters():
writer.add_histogram(name + "_model1", param.clone().cpu().data.numpy(), epoch)
model_path = save_dir2 + '/combine_' + train_parts + '_params_epoch_' + str(epoch) + '.pkl'
net.load_state_dict(torch.load(model_path))
net.eval()
for name, param in net.named_parameters():
writer.add_histogram(name + "_model2", param.clone().cpu().data.numpy(), epoch)
w