初次复现
Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks文章,但是并没有用pix2pix的分割模型,而是使用Unet分割利用pix2pix合成的brats数据集。但是发现原本的Unet模型bceloss效果并不好,val的dice值很低。就开始踏上了炼丹之路。
论文地址:https://arxiv.org/abs/1807.10225
1.学习率
找到最佳的学习率会让模型效果提升事半功倍,但是很多人往往用不调整学习率,训练模型的方式。这导致寻找到最佳的学习率可能会花费大量的时间。寻找最佳学习率该文章中叙述了简单的模型以便去尝试寻找到较为合理的学习率。
以下代码基于Unet代码
def find_lr(init_value = 1e-8, final_value=10., beta = 0.98):
num = len(train_loader)-1
mult = (final_value / init_value) ** (1/num)#**运算
lr = init_value
optimizer.param_groups[0]['lr'] = lr
#动态调整学习率
# 长度为6的字典
# [‘amsgrad’, ‘params’, ‘lr’, ‘betas’, ‘weight_decay’, ‘eps’]
avg_loss = 0.
best_loss = 0.
batch_num = 0
losses = []
log_lrs = []
for data in train_loader:
batch_num += 1
#As before, get the loss for this mini-batch of inputs/outputs
inputs = data['image']
labels = data['mask']
inputs = inputs.to(device=device, dtype=torch.float32)
labels = labels.to(device=device, dtype=torch.float32)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
#Compute the smoothed loss
avg_loss = beta * avg_loss + (1-beta) *loss.item()
smoothed_loss = avg_loss / (1 - beta**batch_num)
#Stop if the loss is exploding
if batch_num > 1 and smoothed_loss > 4 * best_loss:
return log_lrs, losses
#Record the best loss
if smoothed_loss < best_loss or batch_num==1:
best_loss = smoothed_loss
#Store the values
losses.append(smoothed_loss)
log_lrs.append(math.log10(lr))
#Do the SGD step
loss.backward()
optimizer.step()
#Update the lr for the next step
lr *= mult
optimizer.param_groups[0]['lr'] = lr
return log_lrs, losses
if __name__ == '__main__':
img_scale = 0.5
val_percent = 0.1
batch_size = 5
lr = 0.1
dataset = BasicDataset(dir_img, dir_mask, img_scale)
n_val = int(len(dataset) * val_percent)
n_train = len(dataset) - n_val
train, val = random_split(dataset, [n_train, n_val])
train_loader = DataLoader(train, batch_size=batch_size, shuffle=True, num_workers=1, pin_memory=True)
val_loader = DataLoader(val, batch_size=batch_size, shuffle=False, num_workers=1, pin_memory=True, drop_last=True)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
net = UNet(n_channels=3, n_classes=1, bilinear=True)
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.RMSprop(net.parameters(), lr=lr, weight_decay=1e-8, momentum=0.9)
net.to(device=device)#这代表将模型加载到指定设备上
logs,losses = find_lr()
plt.plot(logs[10:-5],losses[10:-5])
plt.show()
可以看出在0.01的位置会达到一个最低的loss值,但到0.1又会马上上升。为了保险起见我们选择0.001的值为最佳学习率。而事实证明0.001是最佳的学习率。
2.loss改进
分割loss小汇总:
2.1 Log loss
改写为
第一行:当y=1 时,y‘越大就与y越接近,即预测越准确,loss越小;
第二行:当y=0 时,y’越小就与y越接近,即预测越准确,loss越小;
最终的loss是y=0和y=1两种类别的loss相加,这种方法有一个明显缺点,当正样本数量远远小于负样本的数量时,即y=0的数量远大于y=1的数量,loss函数中y=0的成分就会占据主导,使得模型严重偏向背景。
所以对于背景远远大于目标的分割任务,Log loss效果非常不好。
2.2Dice Loss
首先定义两个轮廓区域的相似程度,用A、B表示两个轮廓区域所包含的点集,Dice详解
2.2.1diceloss针对的问题
2.2.2diceloss代码
在此期间,找到了多个dice代码,对比Unet本身的dice简单又简洁
利用pytorch的实现:
代码1(网上大部分的版本,在该代码中需要修改):
def dice_coef_loss(logits,targets):
num = targets.size(0)
smooth = 1
probs = torch.sigmoid(logits)
m1 = probs.view(num, -1)
m2 = targets.view(num, -1)
intersection = (m1 * m2)
#score = 2. * (intersection.sum(1) + smooth) / (m1.sum(1) + m2.sum(1) + smooth)
score = 2. * (intersection.sum() + smooth) / (m1.sum() + m2.sum() + smooth)
#score = 1 - score.sum() / num
score = 1 - score
return score
修改1:其中 sum() 是全部元素求和,sum(1)是对二维tensor的列求和
修改2:score 求出来之后无需再➗ num(即batch值),因为本身该loss在计算时就是针对每个batch的计算。除batch之后可能会使dice的值变小。
上面的代码只是实现了2分类的dice loss,那么多分类的dice loss又应该是什么样的呢?
多分类diceloss,参考代码
代码2:
def dice_coef(preds, targets, backprop=True):
smooth = 1.0
class_num = 1
if backprop:
for i in range(class_num):
pred = preds[:,i,:,:]
target = targets[:,i,:,:]
intersection = (pred * target).sum()
loss_ = 1 - ((2.0 * intersection + smooth) / (pred.sum() + target.sum() + smooth))
if i == 0:
loss = loss_
else:
loss = loss + loss_
loss = loss/class_num
return loss
else:
# Need to generalize
#
targets = np.array(targets.argmax(1))
if len(preds.shape) > 3:
preds = np.array(preds).argmax(1)
#上面的处理相当于sigmoid的函数
for i in range(class_num):
pred = (preds==i).astype(np.uint8)
target= (targets==i).astype(np.uint8)
intersection = (pred * target).sum()
loss_ = 1 - ((2.0 * intersection + smooth) / (pred.sum() + target.sum() + smooth))
if i == 0:
loss = loss_
else:
loss = loss + loss_
loss = loss/class_num
return loss
该代码可针对train和val的区别(backprop)
dice值计算注意:
- 为了防止分子为0,添加了smooth = 1的值
- 在训练集中dice尽可能不用sigmoid处理得出的masks_pred,这样能获得尽可能多的训练信息。而在val中要根据模型信息得到最佳的dice值尽可能减少负值对dice值计算的影响。