1.背景
PICC导管
PICC置管,是经外周静脉穿刺中心静脉置管,是利用导管从外周手臂的静脉进行穿刺,导管直达靠近心脏的大血管。避免化疗药物与手臂静脉的直接接触,加上大静脉的血液回流较快,可以迅速稀释化疗药物,防止药物对血管的刺激,因此能够有效的保护上肢静脉,减少静脉炎的发生,减轻患者的疼痛,提高患者的生命质量。
PICC导管定位
大多数研究认为,导管尖端最佳位置应位于:上腔静脉的中下1 /3,或上腔静脉与右心房汇合处上方2~4 cm。也有文献报道,PICC 尖端的最佳位置是其X 线位置在第6 ~7胸椎水平。
目前X光胸片正位片,对于定位PICC导管位置,是一种常用、简单有效的方法。如果PICC导管在胸片定位上人为识别错误,会给患者后续治疗带来隐患。因此有学者提出使用深度学习来辅助医护人员在胸片对PICC导管进行定位。
以下两篇文章都是基于对PICC导管进行分割,然后再进一步定位。可见准确无误分割出PICC管道对于定位是很重要。
《Detection of peripherally inserted central catheter (PICC) in chest X-ray images: A multi-task deep learning model》
《A Deep-Learning System for Fully-Automated Peripherally Inserted Central Catheter (PICC) Tip Detection》
因此本项目通过标注118张数据,使用两种方法,第一种训练前把数据先固定分切成训练集和测试集。第二种使用5折交叉训练模型。结果融合5折交叉模型的分割结果更优。
#安装paddleseg库
!pip install paddleseg==2.5
#解压数据
据
!unzip -o /home/aistudio/data/data162374/picc.zip -d /home/aistudio/work/
2.数据
来源:ChestX-ray8 胸片公开数据
格式:png
数量:有标注的118张,无标注的4张
使用精灵标注助手对118张进行分割标注,像素值0为背景,像素值1为PICC管
数据如下所示:
#生成DataFrame格式,方便进行5折交叉训练
import paddleseg
import cv2
import sklearn
import pandas as pd
import os
import matplotlib.pyplot as plt
import glob
image_paths = [i for i in glob.glob('/home/aistudio/work/picc/image/*.png')]
mask_paths = [i for i in glob.glob('/home/aistudio/work/picc/mask/*.png')]
image_paths.sort()
mask_paths.sort()
seg_data = {'image_path':image_paths,'mask_path':mask_paths}
df = pd.DataFrame(seg_data)
print(len(df))
df.head()
118
image_path | mask_path | |
---|---|---|
0 | /home/aistudio/work/picc/image/00000039_004.png | /home/aistudio/work/picc/mask/00000039_004_1.png |
1 | /home/aistudio/work/picc/image/00000091_002.png | /home/aistudio/work/picc/mask/00000091_002_1.png |
2 | /home/aistudio/work/picc/image/00000121_006.png | /home/aistudio/work/picc/mask/00000121_006_1.png |
3 | /home/aistudio/work/picc/image/00000157_000.png | /home/aistudio/work/picc/mask/00000157_000_1.png |
4 | /home/aistudio/work/picc/image/00000250_004.png | /home/aistudio/work/picc/mask/00000250_004_1.png |
3.创建DataSet
因为PICC管占整张图片非常小的区域,因此数据增强中,使用较大尺寸数据增强方法,并且对比度和明亮度对PICC管在胸片上的显示也其重要的作用。
from sklearn.model_selection import KFold
from paddleseg.models import UNet
import paddleseg.transforms as T
from paddleseg.datasets import Dataset
import paddle
from paddleseg.models.losses import CrossEntropyLoss,DiceLoss,MixedLoss
from paddleseg.core import train
num_classes = 2
iters = 12000
train_transforms = [
T.RandomHorizontalFlip(),
T.RandomDistort(brightness_range=0.5,
brightness_prob=0.5,
contrast_range=0.5,
contrast_prob=0.5,
saturation_prob=0,
hue_prob=0,
sharpness_range=0.5,
sharpness_prob=0.5),
T.RandomRotation(max_rotation = 30,im_padding_value =(0,0,0),label_padding_value = 0),#随机旋转
T.RandomScaleAspect(min_scale = 0.8, aspect_ratio = 0.8),#随机缩放
T.Resize(target_size=(872, 872)),
T.Normalize(mean=[0.5,0.5,0.5],std=[0.2,0.2,0.2])
]
val_transforms = [
T.Resize(target_size=(872, 872)),
T.Normalize(mean=[0.5,0.5,0.5],std=[0.2,0.2,0.2])
]
4. k折交叉训练
因为数据数据比较小,只有118张,如果对数据进行8:2划分,有23张图像进行验证。相当于少了23数据让模型进行学习。因此对数据集划分成为K份,拿其中的一份用作验证,剩下的k-1份数据用作训练。训练k次,得到k个模型。综合k个模型的推理结果进行融合。相当于把118张数据都得到充分利用,提高了模型的泛化能力。
kf = KFold(n_splits=5, shuffle=True, random_state=2022)
for fold, (trn_idx, val_idx) in enumerate(kf.split(df)):
df.loc[val_idx, 'fold'] = fold
for fold in range(5):
#Loss使用CrossEntropyLoss和Diceloss,提高了小目标分割精度
mixtureLosses = [CrossEntropyLoss(),DiceLoss() ]
mixtureCoef = [1,1]
losses = {}
losses['types'] = [MixedLoss(mixtureLosses, mixtureCoef)]
losses['coef'] = [1]
model = UNet(num_classes=num_classes)
base_lr = 0.05
lr = paddle.optimizer.lr.PolynomialDecay(base_lr, power=0.9, decay_steps=iters, end_lr=0)
optimizer = paddle.optimizer.Momentum(lr, parameters=model.parameters(), momentum=0.9, weight_decay=4.0e-5)
print(f'#'*40, flush=True)
print(f'###### Fold: {fold}', flush=True)
print(f'#'*40, flush=True)
train_df = df.query("fold!=@fold").reset_index(drop=True)
valid_df = df.query("fold==@fold").reset_index(drop=True)
with open('/home/aistudio/work/picc/train.txt' ,'w') as f:
for index in range(len(train_df)):
context = str(train_df.iloc[index,0]) +' ' + str(train_df.iloc[index,1])+'\n'
f.write(context)
with open('/home/aistudio/work/picc/val.txt' ,'w') as f:
for index in range(len(valid_df)):
context = str(valid_df.iloc[index,0]) +' ' + str(valid_df.iloc[index,1])+'\n'
f.write(context)
dataset_root = '/home/aistudio/work/picc/'
train_path = '/home/aistudio/work/picc/train.txt'
val_path = '/home/aistudio/work/picc/val.txt'
train_dataset = Dataset(
transforms = train_transforms,
dataset_root = dataset_root,
num_classes = num_classes,
train_path = train_path,
mode = 'train'
)
#验证集
val_dataset = Dataset(
transforms = val_transforms,
dataset_root = dataset_root,
num_classes = num_classes,
val_path = val_path,
mode = 'val'
)
train(
model=model,
train_dataset=train_dataset,
val_dataset=val_dataset,
optimizer=optimizer,
save_dir='output2/' + str(fold) + '/Unet',
iters=iters,
batch_size=2,
save_interval=500,
log_iters=30,
losses=losses,
use_vdl=True)
5.推理
循环加载k折的模型,保存每一折推理的结果。
import numpy as np
from paddleseg.models import UNet
import paddleseg.transforms as T
import cv2
import paddle
import matplotlib.pyplot as plt
transforms = T.Compose([
T.Resize(target_size=(872, 872)),
T.Normalize(mean=[0.5,0.5,0.5],std=[0.2,0.2,0.2])]
)
def predict(model, model_path, im_path):
#推理函数
para_state_dict = paddle.load(model_path)
model.set_dict(para_state_dict)
model.eval()
im = cv2.imread(im_path)
im, _ = transforms(im)
im = im[np.newaxis, ...]
im = paddle.to_tensor(im)
output = model(im)[0]
output = output.numpy()
output = output[0]
return output
im_path = '/home/aistudio/test/00009163_000.png'
model = UNet(num_classes=2)
outputs = np.zeros((2,872,872))#把每一折的模型推理结果都相加起来
fold_outputs = list() #用来保存每一折模型推理结果
for fold in range(5):
model_path = '/home/aistudio/output/'+str(fold) + '/Unet/best_model/model.pdparams'
output = predict(model, model_path, im_path)
fold_outputs.append(output)
outputs += output
5.1展示每一折的训练效果
通过以下的图片,可以看出不同折的推理结果相差很大,如果对数据进行划分,只针对一部分数据进行训练,那有可能只得到以下图片所展示的某一种结果。
plt.figure(figsize=(18,12))
for i,output in enumerate(fold_outputs):
output = np.argmax(output,axis=0)
plt.subplot(2,3,i+1),plt.imshow(output,'gray'), plt.title(str(i)+'_fold'),plt.xticks([]),plt.yticks([])
plt.show()
5.2融合每一折的模型进行推理
如果把每一折的模型推理结果进行融合,可以看出最终结果比每一折的推理结果都要好。例如比3折结果来说,少了假阳性。比1折的结果来说,PICC没有“断开”
outputs = outputs / 5.0 #除以折数,得到最终的融合结果
outputs = np.argmax(outputs,axis=0)
segmentation = np.zeros_like (outputs)#用来保存分割结果
image = cv2.imread(im_path,0)
h,w= image.shape
outputs[outputs == 1] = 127
segmentation[:, :] = outputs[:,:] # 保存分割结果
dim = (w,h)
segmentation = cv2.resize(segmentation, dim, interpolation = cv2.INTER_NEAREST).astype(np.uint8)
combine = cv2.addWeighted(image,0.5,segmentation,0.5,0)
plt.figure(figsize=(18,18))
plt.subplot(1,3,1),plt.imshow(image,'gray'), plt.title('origin'),plt.xticks([]),plt.yticks([])
plt.subplot(1,3,2),plt.imshow(segmentation,'gray'), plt.title('predict'),plt.xticks([]),plt.yticks([])
plt.subplot(1,3,3),plt.imshow(combine), plt.title('combine'),plt.xticks([]),plt.yticks([])
plt.show()
6.总结
本项目得到,使用k折交叉训练模型,然后对k个模型进行融合推理,比单模型效果更好。对于医学分割任务来说,医学分割数据一般样本比较少。使用k折交叉可以充分使用全部样本数据,融合推理得到更优的结果。
声明
此项目为搬运
原项目链接