SimpleITK等解析医学图像

最新推荐文章于 2023-07-19 20:21:20 发布

dreamandgo

最新推荐文章于 2023-07-19 20:21:20 发布

阅读量2.8k

点赞数 4

分类专栏：医学知识 Numpy 文章标签：人工智能

本文链接：https://blog.csdn.net/weixin_40051325/article/details/104970600

版权

医学知识同时被 2 个专栏收录

8 篇文章 8 订阅

订阅专栏

Numpy

2 篇文章 0 订阅

订阅专栏

前言

通过学习以下文章，开始入门医学图像。现附上原文链接，感谢博主们的帮助！
同时附上学习总结（多为博主原文），以方便后续复习补充。

正文一

一. Luna数据集简介(摘自官网)

Luna16数据集

简介：来自于公开的LIDC/IDRI数据集。该数据集剔除了LIDC/IDRI数据集中切片厚度大于2.5mm的扫描数据，共产生了888套CT。Luna16数据集中结节的判定标准为四名放射科专家中至少有三名认定该结节半径大于3mm。因此在数据集的注释中，非结节、半径小于3mm的结节和被1名或两名放射科专家认为是半径大于3mm的结节被认定为无关的发现。
Luna16数据集的组成

subset0-subset10：10个zip文件中包含所有的CT图像。
annotation.csv：csv文件中包含用于肺结节检测比赛作为参考标准使用的注释。注释中一共包含1186个结节。
mhd和.raw文件

subset0-10中每一套CT扫描都是由.mhd和.raw共同给出的。.mhd会给出CT图像的一些基本信息。.raw图像用来存储CT的具体数据。
annotation.csv文件

该文件给出了不同CT中，结节的世界质心坐标和半径。

二. Luna数据集图片解析

mhd文件解析
使用notepad打开.mhd文件,会看到如下键值对。我们对其中重要的几个键值对进行解释。

ObjectType = Image
NDims = 3
BinaryData = True
BinaryDataByteOrderMSB = False
CompressedData = False
TransformMatrix = 1 0 0 0 1 0 0 0 1
Offset = -161.699997 -169.5 -316.46499599999999
CenterOfRotation = 0 0 0
AnatomicalOrientation = RAI
ElementSpacing = 0.6640620231628418 0.6640620231628418 0.625
DimSize = 512 512 471
ElementType = MET_SHORT
ElementDataFile = 1.3.6.1.4.1.14519.5.2.1.6279.6001.100530488926682752765845212286.raw

# ObjectType是文件类型
# NDims是raw数据的维度
# offset是质心坐标。（第一次看到这个坐标的时候觉得很奇怪，还记得annotation中结节对应的坐标吗。两种坐标都是这种形式的。）其实这种表示也同样应用与dicom文件。结节在矩阵中的位置的计算公式如下
x= (x_ano-x_offset)/x_ElementSpacing 
y= (y_ano-y_offset)/y_ElementSpacing 
z= (z_ano-z_offset)/z_ElementSpacing 
# 其中，x是实际对应三维矩阵中的坐标。
# x_ano是肺结节在annotation.csv中的坐标.
# x_offset是质心坐标.
# x_ElementSpacing是在x轴方向上的步长。相当于每一个像素对应现实世界中的长度。

TransformMatric:图像矩阵是否翻转的标志。(在现实中CT扫描的时候有的人是正卧，有的人是仰卧的，所以会导致图像会出现翻转的情况。)
DimSize：三个数字分别表示x,y,z方向上的维度。
ElementSpacing:x,y,z方向上的步长
ElementDataFile:该mhd对应的raw文件，raw中存放了CT值矩阵，也就是所有CT扫描断层。

读取CT值矩阵。CT是按层扫描的，每一层CT图像罗列形成三维矩阵。
读取数据的流程是先读取mhd，通过MatricTransform判别三维CT值矩阵在x,y方向上是否需要翻转。然后返回三维CT矩阵。

def load_itk_image(filename):
    with open(filename) as f:
        contents = f.readlines()
        line = [k for k in contents if k.startswith('TransformMatrix')][0]
        transform = np.array(line.split(' = ')[1].split(' ')).astype('float')
        transform = np.round(transform)
        if np.any(transform != np.array([1, 0, 0, 0, 1, 0, 0, 0, 1])):
            isflip = True
        else:
            isflip = False
    itkimage = sitk.ReadImage(filename)
    numpyimage = sitk.GetArrayFromImage(itkimage)
    if(isflip == True):
        numpyimage = numpyimage[:,::-1,::-1]
    return numpyimage

CT值矩阵的预处理与归一化

CT值与我们平时看到的像素值是不同的。CT值是测定人体某器官密度的计量单位单位是HU。人体组织的范围大约为-1000~+1000.因此如果我们想可视化一层CT，那么我们就要对这一层的二维CT矩阵进行预处理与归一化。

具体成像的原理可以找详细资料，比如医学中的肺窗和纵膈窗。不同窗位和算法形成的图像也是不同的。这里我们只是从可视化的角度找了一种最简单的处理方法。

# 预处理：首先就是将CT值过大和过小的数据置为0。
def truncate_hu(image_array):
    image_array[image_array > 400] = 0
    image_array[image_array <-1000] = 0

# 归一化
def normalazation(image_array):
    max = image_array.max()
    min = image_array.min()
    image_array = (image_array-min)/(max-min) 
    avg = image_array.mean()
    image_array = image_array-avg
    return image_array

这里得到的image_array就是归一化后的图像矩阵了。我们可以用matplotlib可视化一层。

可视化一层

# 下面的程序挑取一层进行可视化，并标注结节位置。
a = image_array.transpose(1,2,0)[:,:,0]  # transpose是将(z,x,y)的三维矩阵转为(x,y,z)的矩阵
plt.gca().add_patch( plt.Rectangle((147,297), 24,24, fill=False,edgecolor='r', linewidth=3))
plt.imshow(a[:,:,1]*255)  # 在图中画框
plt.show()

在这里插入图片描述
原文链接：https://blog.csdn.net/qq_22152499/article/details/89913821

正文二

一. 数据格式

1.1 dicom格式

DICOM是医学图像中标准文件，这些文件包含了诸多的元数据信息（比如像素尺寸，每个维度的一像素代表真实世界里的长度）。后缀为 .dcm。
每个病人的一次扫描CT(scan)可能有几十到一百多个dcm数据文件(slices)。可以使用 python的dicom包读取，读取示例代码如下:

 dicom.read_file('xxx.dcm')

其中，pixl_array包含了真实数据。

 slices = [dicom.read_file(os.path.join(folder_name,filename)) for filename in os.listdir(folder_name)]
 slices = np.stack([s.pixel_array for s in slices])

1.2 mhd格式

mhd格式是另外一种数据格式，来源于(LUNA2016)[https://luna16.grand-challenge.org/data/]。每个病人一个mhd文件和一个同名的raw文件。
一个mhd通常有几百兆，对应的raw文件只有1kb。mhd文件需要借助python的SimpleITK包来处理。SimpleITK示例代码如下:

import SimpleITK as sitk
itk_img = sitk.ReadImage(img_file)
img_array = sitk.GetArrayFromImage(itk_img)  # indexes are z,y,x (notice the ordering)
num_z, height, width = img_array.shape  
origin = np.array(itk_img.GetOrigin())  # x,y,z Origin in world coordinates (mm)
spacing = np.array(itk_img.GetSpacing())  # spacing of voxels in world coor. (mm)

需要注意的是，SimpleITK的img_array的数组不是直接的像素值，而是相对于CT扫描中原点位置的差值，需要做进一步转换。转换需*spacing。

二. dicom格式数据处理过程

2.1 处理思路

首先，需要明白的是医学扫描图像(scan)其实是三维图像，使用代码读取之后开源查看不同的切面的切片(slices)，可以从不同轴切割dicom格式的图像
其次，CT扫描图是包含了所有组织的，如果直接去看，看不到任何有用信息。需要做一些预处理,预处理中一个重要的概念是放射剂量，衡量单位为HU(Hounsfield Unit)，下表是不同放射剂量对应的组织器官
substance HU
空气 -1000
肺 -500
脂肪 -100到-50
水 0
CSF 15
肾 30
血液 +30到+45
肌肉 +10到+40
灰质 +37到+45
白质 +20到+30
Liver +40到+60
软组织,constrast +100到+300
骨头 +700(软质骨)到+3000(皮质骨)

Hounsfield Unit = pixel_value * rescale_slope + rescale_intercept

一般情况rescale slope = 1, intercept = -1024。

上表中肺部组织的HU数值为-500,但通常是大于这个值，比如-320、-400。挑选出这些区域，然后做其他变换抽取出肺部像素点。

2.2 先载入必要的包

# -*- coding:utf-8 -*-
'''
this script is used for basic process of lung 2017 in Data Science Bowl
'''
import glob
import os
import pandas as pd
import SimpleITK as sitk
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import skimage, os
from skimage.morphology import ball, disk, dilation, binary_erosion, remove_small_objects, erosion, closing, reconstruction, binary_closing
from skimage.measure import label,regionprops, perimeter
from skimage.morphology import binary_dilation, binary_opening
from skimage.filters import roberts, sobel
from skimage import measure, feature
from skimage.segmentation import clear_border
from skimage import data
from scipy import ndimage as ndi
import matplotlib
#matplotlib.use('Agg')
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import dicom
import scipy.misc
import numpy as np

如下代码是载入一个扫描面，包含了多个切片(slices)，我们仅简化地将其存储为python列表。数据集中每个目录都是一个扫描面集（一个病人）。有个元数据域丢失，即Z轴方向上的像素尺寸，也即切片的厚度。所幸，我们可以用其他值推测出来，并加入到元数据中。

# Load the scans in given folder path
def load_scan(path):
    slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
    slices.sort(key = lambda x: int(x.ImagePositionPatient[2]))
    try:
    	slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
    except:
    	slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
    for s in slices:
    	s.SliceThickness = slice_thickness
    return slices

2.3 灰度值转换为HU单元

首先去除灰度值为 -2000的pixl_array，CT扫描边界之外的灰度值固定为-2000(dicom和mhd都是这个值)。第一步是设定这些值为0，当前对应为空气（值为0）

回到HU单元，乘以rescale比率并加上intercept(存储在扫描面的元数据中)。

def get_pixels_hu(slices):
    image = np.stack([s.pixel_array for s in slices])
    # Convert to int16 (from sometimes int16),
    # should be possible as values should always be low enough (<32k)
    image = image.astype(np.int16)
    # Set outside-of-scan pixels to 0
    # The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    # Convert to Hounsfield units (HU)
    for slice_number in range(len(slices)):
    intercept = slices[slice_number].RescaleIntercept
    slope = slices[slice_number].RescaleSlope
    if slope != 1:
    image[slice_number] = slope * image[slice_number].astype(np.float64)
    image[slice_number] = image[slice_number].astype(np.int16)
    image[slice_number] += np.int16(intercept)
    return np.array(image, dtype=np.int16)

可以查看病人的扫描HU值分布情况：

first_patient = load_scan(INPUT_FOLDER + patients[0])
first_patient_pixels = get_pixels_hu(first_patient)
plt.hist(first_patient_pixels.flatten(), bins=80, color='c')
plt.xlabel("Hounsfield Units (HU)")
plt.ylabel("Frequency")
plt.show()

2.4 重采样

不同扫描面的像素尺寸、粗细粒度是不同的。这不利于我们进行CNN任务，我们可以使用同构采样。

一个扫描面的像素区间可能是[2.5,0.5,0.5],即切片之间的距离为2.5mm。可能另外一个扫描面的范围是[1.5,0.725,0.725]。这可能不利于自动分析。

常见的处理方法是从全数据集中以固定的同构分辨率重新采样，将所有的东西采样为1mmx1mmx1mm像素。

def resample(image, scan, new_spacing=[1,1,1]):
    # Determine current pixel spacing
    spacing = map(float, ([scan[0].SliceThickness] + scan[0].PixelSpacing))
    spacing = np.array(list(spacing))
    resize_factor = spacing / new_spacing
    new_real_shape = image.shape * resize_factor
    new_shape = np.round(new_real_shape)
    real_resize_factor = new_shape / image.shape
    new_spacing = spacing / real_resize_factor
    image = scipy.ndimage.interpolation.zoom(image, real_resize_factor, mode='nearest')
    return image, new_spacing

现在重新取样病人的像素，将其映射到一个同构分辨率 1mm x1mm x1mm。

pix_resampled, spacing = resample(first_patient_pixels, first_patient, [1,1,1])

使用matplotlib输出肺部扫描的3D图像方法。可能需要一两分钟。

def plot_3d(image, threshold=-300):
    # Position the scan upright,
    # so the head of the patient would be at the top facing the camera
    p = image.transpose(2,1,0)
    verts, faces = measure.marching_cubes(p, threshold)
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111, projection='3d')
    # Fancy indexing: `verts[faces]` to generate a collection of triangles
    mesh = Poly3DCollection(verts[faces], alpha=0.1)
    face_color = [0.5, 0.5, 1]
    mesh.set_facecolor(face_color)
    ax.add_collection3d(mesh)
    ax.set_xlim(0, p.shape[0])
    ax.set_ylim(0, p.shape[1])
    ax.set_zlim(0, p.shape[2])
    plt.show()

打印函数有个阈值参数，来打印特定的结构，比如tissue或者骨头。400是一个仅仅打印骨头的阈值(HU对照表)

plot_3d(pix_resampled, 400)

2.5 输出一个病人scans中所有切面slices

def plot_ct_scan(scan):
    '''
    plot a few more images of the slices
    :param scan:
    :return:
    '''
    f, plots = plt.subplots(int(scan.shape[0] / 20) + 1, 4, figsize=(50, 50))
    for i in range(0, scan.shape[0], 5):
    plots[int(i / 20), int((i % 20) / 5)].axis('off')
    plots[int(i / 20), int((i % 20) / 5)].imshow(scan[i], cmap=plt.cm.bone)

2.6 定义分割出CT切面里面肺部组织的函数

下面的代码使用了pythonde 的图像形态学操作。具体可以参考python高级形态学操作

def get_segmented_lungs(im, plot=False):
    '''
    This funtion segments the lungs from the given 2D slice.
    '''
    if plot == True:
    	f, plots = plt.subplots(8, 1, figsize=(5, 40))
    '''
    Step 1: Convert into a binary image.
    '''
    binary = im < 604
    if plot == True:
    	plots[0].axis('off')
    	plots[0].set_title('binary image')
    	plots[0].imshow(binary, cmap=plt.cm.bone)
    '''
    Step 2: Remove the blobs connected to the border of the image.
    '''
    cleared = clear_border(binary)
    if plot == True:
    	plots[1].axis('off')
    	plots[1].set_title('after clear border')
    	plots[1].imshow(cleared, cmap=plt.cm.bone)
    '''
    Step 3: Label the image.
    '''
    label_image = label(cleared)
    if plot == True:
    	plots[2].axis('off')
    	plots[2].set_title('found all connective graph')
    	plots[2].imshow(label_image, cmap=plt.cm.bone)
    '''
    Step 4: Keep the labels with 2 largest areas.
    '''
    areas = [r.area for r in regionprops(label_image)]
    areas.sort()
    if len(areas) > 2:
    	for region in regionprops(label_image):
    		if region.area < areas[-2]:
    			for coordinates in region.coords:
    				label_image[coordinates[0], coordinates[1]] = 0
    				binary = label_image > 0
    if plot == True:
    	plots[3].axis('off')
    	plots[3].set_title(' Keep the labels with 2 largest areas')
    	plots[3].imshow(binary, cmap=plt.cm.bone)
    '''
    Step 5: Erosion operation with a disk of radius 2. This operation is
    seperate the lung nodules attached to the blood vessels.
    '''
    selem = disk(2)
    binary = binary_erosion(binary, selem)
    if plot == True:
    	plots[4].axis('off')
    	plots[4].set_title('seperate the lung nodules attached to the blood vessels')
    	plots[4].imshow(binary, cmap=plt.cm.bone)
    '''
    Step 6: Closure operation with a disk of radius 10. This operation is
    to keep nodules attached to the lung wall.
    '''
    selem = disk(10)
    binary = binary_closing(binary, selem)
    if plot == True:
    	plots[5].axis('off')
    	plots[5].set_title('keep nodules attached to the lung wall')
    	plots[5].imshow(binary, cmap=plt.cm.bone)
    '''
    Step 7: Fill in the small holes inside the binary mask of lungs.
    '''
    edges = roberts(binary)
    binary = ndi.binary_fill_holes(edges)
    if plot == True:
    	plots[6].axis('off')
    	plots[6].set_title('Fill in the small holes inside the binary mask of lungs')
    	plots[6].imshow(binary, cmap=plt.cm.bone)
    '''
    Step 8: Superimpose the binary mask on the input image.
    '''
    get_high_vals = binary == 0
    im[get_high_vals] = 0
    if plot == True:
    	plots[7].axis('off')
    	plots[7].set_title('Superimpose the binary mask on the input image')
    	plots[7].imshow(im, cmap=plt.cm.bone)
    	return im

此方法每个步骤对图像做不同的处理，依次为二值化、清除边界、连通区域标记、腐蚀操作、闭合运算、孔洞填充。

2.7 肺部图像分割

为了减少有问题的空间，我们可以分割肺部图像（有时候是附近的组织）。这包含一些步骤，包括区域增长和形态运算，此时，我们只分析相连组件。

步骤如下：
阈值图像（-320HU是个极佳的阈值，但是此方法中不是必要）
处理相连的组件，以决定当前患者的空气的标签，以1填充这些二值图像
可选：当前扫描的每个轴上的切片，选定最大固态连接的组织（当前患者的肉体和空气），并且其他的为0。以掩码的方式填充肺部结构。
只保留最大的气袋（人类躯体内到处都有气袋）

def largest_label_volume(im, bg=-1):
    vals, counts = np.unique(im, return_counts=True)
    counts = counts[vals != bg]
    vals = vals[vals != bg]
    if len(counts) > 0:
    	return vals[np.argmax(counts)]
    else:
    	return None
def segment_lung_mask(image, fill_lung_structures=True):
  # not actually binary, but 1 and 2.
  # 0 is treated as background, which we do not want
    binary_image = np.array(image > -320, dtype=np.int8)+1
    labels = measure.label(binary_image)
    # Pick the pixel in the very corner to determine which label is air.
    # Improvement: Pick multiple background labels from around the patient
    # More resistant to "trays" on which the patient lays cutting the air
    # around the person in half
    background_label = labels[0,0,0]
    #Fill the air around the person
    binary_image[background_label == labels] = 2
    # Method of filling the lung structures (that is superior to something like
    # morphological closing)
    if fill_lung_structures:
    	# For every slice we determine the largest solid structure
    	for i, axial_slice in enumerate(binary_image):
    		axial_slice = axial_slice - 1
    		labeling = measure.label(axial_slice)
    		l_max = largest_label_volume(labeling, bg=0)
    if l_max is not None: #This slice contains some lung
    	binary_image[i][labeling != l_max] = 1
    	binary_image -= 1 #Make the image actual binary
    	binary_image = 1-binary_image # Invert it, lungs are now 1
    # Remove other air pockets insided body
    labels = measure.label(binary_image, background=0)
    l_max = largest_label_volume(labels, bg=0)
    if l_max is not None: # There are air pockets
    binary_image[labels != l_max] = 0
    return binary_image

查看切割效果

segmented_lungs = segment_lung_mask(pix_resampled, False)
segmented_lungs_fill = segment_lung_mask(pix_resampled, True)
plot_3d(segmented_lungs, 0)

我们可以将肺内的结构也包含进来（结节是固体），不仅仅只是肺部内的空气

plot_3d(segmented_lungs_fill, 0)

使用mask时，要注意首先进行形态扩充(python的skimage的skimage.morphology)操作（即使用圆形kernel，结节是球体），参考 python形态操作。这会在所有方向（维度）上扩充mask。仅仅肺部的空气+结构将不会包含所有结节，事实上有可能遗漏黏在肺部一侧的结节（这会经常出现，所以建议最好是扩充mask）。

2.8 数据标准化处理

归一化处理

当前的值范围是[-1024,2000]。而任意大于400的值并不是处理肺结节需要考虑，因为它们都是不同反射密度下的骨头。LUNA16竞赛中常用来做归一化处理的阈值集是-1000和400.以下代码

归一化

MIN_BOUND = -1000.0
MAX_BOUND = 400.0
def normalize(image):
    image = (image - MIN_BOUND) / (MAX_BOUND - MIN_BOUND)
    image[image>1] = 1.
    image[image<0] = 0.
    return image

0值中心化

简单来说就是所有像素值减去均值。LUNA16竞赛中的均值大约是0.25.

不要对每一张图像做零值中心化（此处像是在kernel中完成的）CT扫描器返回的是校准后的精确HU计量。不会出现普通图像中会出现某些图像低对比度和明亮度的情况

PIXEL_MEAN = 0.25
def zero_center(image):
    image = image - PIXEL_MEAN
    return image

归一化和零值中心化的操作主要是为了后续训练网络，零值中心化是网络收敛的关键。

三. mhd格式数据处理过程

3.1 处理思路

mhd的数据只是格式与dicom不一样，其实质包含的都是病人扫描。处理MHD需要借助SimpleIKT这个包，处理思路详情可以参考Data Science Bowl2017的toturail Data Science Bowl 2017。需要注意的是MHD格式的数据没有HU值，它的值域范围与dicom很不同。

我们以LUNA2016年的数据处理流程为例。参考代码为 LUNA2016数据切割

3.2 载入必要的包

import SimpleITK as sitk
import numpy as np
import csv
from glob import glob
import pandas as pd
file_list=glob(luna_subset_path+"*.mhd")
#####################
#
# Helper function to get rows in data frame associated
# with each file
def get_filename(case):
	global file_list
	for f in file_list:
		if case in f:
			return(f)

# The locations of the nodes
df_node = pd.read_csv(luna_path+"annotations.csv")
df_node["file"] = df_node["seriesuid"].apply(get_filename)
df_node = df_node.dropna()
#####
#
# Looping over the image files
#
fcount = 0
for img_file in file_list:
	print "Getting mask for image file %s" % img_file.replace(luna_subset_path,"")
	mini_df = df_node[df_node["file"]==img_file] #get all nodules associate with file
if len(mini_df)>0: # some files may not have a nodule--skipping those
	biggest_node = np.argsort(mini_df["diameter_mm"].values)[-1] # just using the biggest node
	node_x = mini_df["coordX"].values[biggest_node]
	node_y = mini_df["coordY"].values[biggest_node]
	node_z = mini_df["coordZ"].values[biggest_node]
	diam = mini_df["diameter_mm"].values[biggest_node]

3.3 LUNA16的MHD格式数据的值

一直在寻找MHD格式数据的处理方法，对于dicom格式的CT有很多论文根据其HU值域可以轻易地分割肺、骨头、血液等，但是对于MHD没有这样的参考。从LUNA16论坛得到的解释是，LUNA16的MHD数据已经转换为HU值了，不需要再使用slope和intercept来做rescale变换了。此论坛主题下，有人提出MHD格式没有提供pixel spacing(mm) 和 slice thickness(mm) ，而标准文件annotation.csv文件中结节的半径和坐标都是mm单位，最后确认的是MHD格式文件中只保留了体素尺寸以及坐标原点位置，没有保存slice thickness。即，dicom才是原始数据格式。

3.4 坐标体系变换

MHD值的坐标体系是体素，以mm为单位（dicom的值是GV灰度值）。结节的位置是CT scanner坐标轴里面相对原点的mm值，需要将其转换到真实坐标轴位置，可以使用SimpleITK包中的 GetOrigin() GetSpacing()。图像数据是以512x512数组的形式给出的。

坐标变换如下：

相应的代码处理如下:

itk_img = sitk.ReadImage(img_file)
img_array = sitk.GetArrayFromImage(itk_img) # indexes are z,y,x (notice the ordering)
center = np.array([node_x,node_y,node_z]) # nodule center
origin = np.array(itk_img.GetOrigin()) # x,y,z Origin in world coordinates (mm)
spacing = np.array(itk_img.GetSpacing()) # spacing of voxels in world coor. (mm)
v_center =np.rint((center-origin)/spacing) # nodule center in voxel space (still x,y,z ordering)

在LUNA16的标注CSV文件中标注了结节中心的X,Y,Z轴坐标，但是实际取值的时候取的是Z轴最后三层的数组(img_array)。

下述代码只提取了包含结节的最后三个slice的数据，代码参考自 LUNA_mask_extraction.py

i = 0
for i_z in range(int(v_center[2])-1,int(v_center[2])+2):
	mask = make_mask(center,diam,i_z*spacing[2]+origin[2],width,height,spacing,origin)
	masks[i] = mask
	imgs[i] = matrix2int16(img_array[i_z])
	i+=1
np.save(output_path+"images_%d.npy" % (fcount) ,imgs)
np.save(output_path+"masks_%d.npy" % (fcount) ,masks)

3.5 查看结节

以下代码用于查看原始CT和结节mask。其实就是用matplotlib打印上一步存储的npy文件。

import matplotlib.pyplot as plt
imgs = np.load(output_path+'images_0.npy')
masks = np.load(output_path+'masks_0.npy')
for i in range(len(imgs)):
	print "image %d" % i
	fig,ax = plt.subplots(2,2,figsize=[8,8])
	ax[0,0].imshow(imgs[i],cmap='gray')
	ax[0,1].imshow(masks[i],cmap='gray')
	ax[1,0].imshow(imgs[i]*masks[i],cmap='gray')
	plt.show()
	raw_input("hit enter to cont : ")

接下来的处理和DICOM格式数据差不多，腐蚀膨胀、连通区域标记等。

参考信息：

灰度值是pixel value经过重重LUT转换得到的用来进行显示的值，而这个转换过程是不可逆的，也就是说，灰度值无法转换为ct值。只能根据窗宽窗位得到一个大概的范围。 pixel value经过modality lut得到Hu，但是怀疑pixelvalue的读取出了问题。dicom文件中存在（0028，0106）（0028，0107）两个tag，分别是最大最小pixel value，可以用来检验你读取的pixel value 矩阵是否正确。

LUT全称look up table，实际上就是一张像素灰度值的映射表，它将实际采样到的像素灰度值经过一定的变换如阈值、反转、二值化、对比度调整、线性变换等，变成了另外一 个与之对应的灰度值，这样可以起到突出图像的有用信息，增强图像的光对比度的作用。