LUNA16数据集小知识
LUNA16数据集包括888低剂量肺部CT影像(mhd格式)数据,每个影像包含一系列胸腔的多个轴向切片。每个影像包含的切片数量会随着扫描机器、扫描层厚和患者的不同而有差异。原始图像为三维图像。每个图像包含一系列胸腔的多个轴向切片。这个三维图像由不同数量的二维图像组成。
LUNA16数据集的由来
LUNA16数据集是最大公用肺结节数据集LIDC-IDRI的子集,LIDC-IDRI它包括1018个低剂量的肺部CT影像。LIDC-IDRI删除了切片厚度大于3mm和肺结节小于3mm的CT影像,剩下的就是LUNA16数据集了。
CT影像简介
CT 采集后所得的数值为 X 射线衰减值,单位为亨氏单位(Hounsfield unit, HU). 水的 HU 值为 0,空气的 HU 值为-1000, 而其他物体的 HU 值计算公式为
其中 , 为线性衰减系数 ,和 X 射线强度有关 。
亨氏单位值经过线性变换成为图像中的像素值 .。不同设备由于变换标准不同 , 所获得的图像像素值有一定差别 ; 但是在相同的 X 射线条件下 , CT照射人体所获得的亨氏单位值却是相同的 , 如表 1所示
在对肺部 CT 图像处理之中 , 由于肺的 HU 值为 -500 左右 , 一般做法是将 HU 值在 [-1000,+ 400]内的区域保留 ( 从空气到骨骼 ), 超出此范围的区域就可以认为与肺部疾病监测无关而舍去 。
这里面写一个luna16的数据处理吧
# -*- coding:utf-8 -*-
'''
this script is used for basic process of lung 2017 in Data Science Bowl
'''
import SimpleITK as sitk
from skimage.morphology import ball, disk, dilation, binary_erosion, remove_small_objects, erosion, closing, \
reconstruction, binary_closing
from skimage.measure import label, regionprops
from skimage.filters import roberts
from skimage.segmentation import clear_border
from scipy import ndimage as ndi
import matplotlib.pyplot as plt
# numpyImage[numpyImage > -600] = 1
# numpyImage[numpyImage <= -600] = 0
def get_segmented_lungs(im, plot=False):
'''
This funtion segments the lungs from the given 2D slice.
'''
if plot == True:
f, plots = plt.subplots(8, 1, figsize=(5, 40))
'''
Step 1: Convert into a binary image.
'''
binary = im < -600
if plot == True:
plots[0].axis('off')
plots[0].imshow(binary, cmap=plt.cm.bone)
'''
Step 2: Remove the blobs connected to the border of the image.
'''
cleared = clear_border(binary)
if plot == True:
plots[1].axis('off')
plots[1].imshow(cleared, cmap=plt.cm.bone)
'''
Step 3: Label the image.
'''
label_image = label(cleared)
if plot == True:
plots[2].axis('off')
plots[2].imshow(label_image, cmap=plt.cm.bone)
'''
Step 4: Keep the labels with 2 largest areas.
'''
areas = [r.area for r in regionprops(label_image)]
areas.sort()
if len(areas) > 2:
for region in regionprops(label_image):
if region.area < areas[-2]:
for coordinates in region.coords:
label_image[coordinates[0], coordinates[1]] = 0
binary = label_image > 0
if plot == True:
plots[3].axis('off')
plots[3].imshow(binary, cmap=plt.cm.bone)
'''
Step 5: Erosion operation with a disk of radius 2. This operation is
seperate the lung nodules attached to the blood vessels.
'''
selem = disk(2)
binary = binary_erosion(binary, selem)
if plot == True:
plots[4].axis('off')
plots[4].imshow(binary, cmap=plt.cm.bone)
'''
Step 6: Closure operation with a disk of radius 10. This operation is
to keep nodules attached to the lung wall.
'''
selem = disk(10)
binary = binary_closing(binary, selem)
if plot == True:
plots[5].axis('off')
plots[5].imshow(binary, cmap=plt.cm.bone)
'''
Step 7: Fill in the small holes inside the binary mask of lungs.
'''
edges = roberts(binary)
binary = ndi.binary_fill_holes(edges)
if plot == True:
plots[6].axis('off')
plots[6].imshow(binary, cmap=plt.cm.bone)
'''
Step 8: Superimpose the binary mask on the input image.
'''
get_high_vals = binary == 0
im[get_high_vals] = 0
if plot == True:
plots[7].axis('off')
plots[7].imshow(im, cmap=plt.cm.bone)
plt.show()
return im
if __name__ == '__main__':
filename = './raw_data/1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492.mhd'
itkimage = sitk.ReadImage(filename) # 读取.mhd文件
numpyImage = sitk.GetArrayFromImage(itkimage) # 获取数据,自动从同名的.raw文件读取
data = numpyImage[50]
plt.figure(50)
plt.imshow(data, cmap='gray')
im = get_segmented_lungs(data, plot=True)
plt.figure(200)
plt.imshow(im, cmap='gray')
plt.show()