pyradiomics库的配置文件yaml讲解

一定要努力啊

已于 2022-01-26 11:27:49 修改

阅读量4.6k

点赞数 26

文章标签： python 深度学习机器学习

于 2022-01-25 20:51:32 首次发布

本文链接：https://blog.csdn.net/qq_40327008/article/details/119867447

版权

本片博客讲解的是pyradiomics库的配置文件有关内容。pyradiomics库是医学图像（影响组学）领域中用于对图像特征进行提取的库，其内容复杂，库形完整，可基本适用于大多数要求。具体用法可以自行百度，这里只讲解其配置文件内容。下方是该库的官网链接

https://pyradiomics.readthedocs.io

对于pyradiomics库自定义提取特征的方式有两种，一种是代码中自己定义，也就是想提取啥就敲对应方法的代码，这一块可以百度一下。另一种就是使用配置文件（.yaml）来自定义想要提取的特征。自定义文件虽然官网有大致文件，但是内容不全，并没有进行详细的如何使用，为了进行全面的讲解，我自己手动建立了一个目前再用的配置文件，其囊括了几乎所有的内容，配置文件内容如下：

# This is an example of a parameters file
# It is written according to the YAML-convention (www.yaml.org) and is checked by the code for consistency.
# Three types of parameters are possible and reflected in the structure of the document:
#
# Parameter category:
#   Setting Name: <value>
#
# The three parameter categories are:
# - setting: Setting to use for preprocessing and class specific settings. if no <value> is specified, the value for
#   this setting is set to None.
# - featureClass: Feature class to enable, <value> is list of strings representing enabled features. If no <value> is
#   specified or <value> is an empty list ('[]'), all features for this class are enabled.
# - imageType: image types to calculate features on. <value> is custom kwarg settings (dictionary). if <value> is an
#   empty dictionary ('{}'), no custom settings are added for this input image.
#
# Some parameters have a limited list of possible values. Where this is the case, possible values are listed in the
# package documentation

# Settings to use, possible settings are listed in the documentation (section "Customizing the extraction").
setting:
  binWidth: 15
  label: 1
  interpolator: 'sitkBSpline' # This is an enumerated value, here None is not allowed
  resampledPixelSpacing: [0.625, 0.625, 2.4]# This disables resampling, as it is interpreted as None, to enable it, specify spacing in x, y, z as [x, y , z]
  weightingNorm: # If no value is specified, it is interpreted as None
  geometryTolerance: 0.0001
  normalize: False

# Image types to use: "Original" for unfiltered image, for possible filters, see documentation.
imageType:
  Original: {}
  # Square: {}
  # SquareRoot: {}
  # Logarithm: {}
  # Exponential: {}
  LoG:
  #   # If the in-plane spacing is large (> 2mm), consider removing sigma value 1.
    sigma: [2.0, 3.0, 4.0, 5.0]
  Wavelet:
    wavelet: 'rbio1.1'
    binWidth: 10
  # LBP3D:
  #   lbp3DLevels: 2
  #   lbp3DIcosphereRadius: 1
  #   lbp3DIcosphereSubdivision: 1
  # Gradient: {}
# Featureclasses, from which features must be calculated. If a featureclass is not mentioned, no features are calculated
# for that class. Otherwise, the specified features are calculated, or, if none are specified, all are calculated.
featureClass:
  shape:  # disable redundant Compactness 1 and Compactness 2 features by specifying all other shape features
  firstorder: 
  glcm:  
  glrlm: # for lists none values are allowed, in this case, all features are enabled
  glszm:
  ngtdm:
  gldm:

从上述代码中我们可以看出配置文件可分为三个部分：setting，imageType和featureClass。我们分开来讲解，首先是第一个setting部分：

setting:
  binWidth: 15
  label: 1
  interpolator: 'sitkBSpline' # This is an enumerated value, here None is not allowed
  resampledPixelSpacing: [0.625, 0.625, 2.4]# This disables resampling, as it is interpreted as None, to enable it, specify spacing in x, y, z as [x, y , z]
  weightingNorm: # If no value is specified, it is interpreted as None
  geometryTolerance: 0.0001
  normalize: False

第一个参数binWidth表示的是每个bin的宽度为15，因为pyradiomics库提取的大部分特征是根据各种灰度直方图算的，所以binWidth指的是制作直方图和离散化图像灰度时的 bin 大小。

label指的是你传入的mask文件的哪一类是你要用的标签。比如一般的医学图像的mask会有多种标签，0表示肿瘤，1表示组织啥的，所以你要指定哪一个是你要用的。

interpolator重采样是的差值方法

resampledPixelSpacing重采样的参数，关于重采样内容可以自行百度。

weightingNorm和geometryTolerance这两个参数目前我没有深入研究，用的是官网默认的参数

normalize就是是否归一化

下面就是imageType的第二部分，这也是我具体想讲的内容。

imageType:
  # Original: {}
  # Square: {}
  # SquareRoot: {}
  # Logarithm: {}
  # Exponential: {}
  LoG:
  #   # If the in-plane spacing is large (> 2mm), consider removing sigma value 1.
    sigma: [2.0, 3.0, 4.0, 5.0]
  Wavelet:
    wavelet: 'rbio1.1'
    binWidth: 10
  # LBP3D:
  #   lbp3DLevels: 2
  #   lbp3DIcosphereRadius: 1
  #   lbp3DIcosphereSubdivision: 1
  # Gradient: {}

imagetype定义的是要对图像进行的操作，可以变向的理解成数据增强把，本来应该提取100多个特征的，但是用了imageType后可以增加至成败上千个特征。首先来讲第一个参数Original，它代表不对图像进行任何操作，直接提取。

Square表示对图像进行平方操作再提取特征

SquareRoot表示对图像进行平方根操作之后再提取特征

Logarithm表示对图像进行log变换后再提取特征

Exponential表示对图像进行指数变换后再提取特征

上述这几个都是直接对原图像的像素值进行操作，改变的是原有的像素值，换句话说可以改变图像的明暗程度，对比度啥的。

LoG表示的是拉普拉斯变换，因为LoG分为两步：1.高斯去噪 2.拉普拉斯变换，所以sigma参数表示的是第一步去噪的卷积核的sigma参数，其大小影响了去噪的模糊程度。

Wavelet表示的是小波变换，其中wavelet表示的是小波的基函数类型，binwindth顾名思义表示的是bin的宽度。Customizing the Extraction — pyradiomics v3.0.1.post13+g2e0b76e documentation这个链接为官网的链接，里面包含了可选的小波基函数的类型。具体的小波变换的原理可以自行百度，因为讲起来很复杂，百度有具体的讲解。

LBP3D用来提取图像局部纹理特征，参数有3个，具体的我也没有仔细研究过，给一个参考链接人脸识别经典算法二：LBP方法_huangxiaojie的专栏-CSDN博客

Gradient提取的是图像的梯度信息

第三部分就是需要给定需要提取的feature类别

featureClass:
  shape:  # disable redundant Compactness 1 and Compactness 2 features by specifying all other shape features
  firstorder: 
  glcm:  
  glrlm: # for lists none values are allowed, in this case, all features are enabled
  glszm:
  ngtdm:
  gldm:

这里我们需要提取什么类别的信息我们就把哪些写上，哪些不需要就不用写，默认是都写上的，我也建议全部写上，因为毕竟多提取几个特征总归是有好处的，打不了后续在特征筛选一下呗，这个我后续抽出时间可以写一下有关特征筛选的步骤与方法。现在详细解释一下上面那些都是什么意思。详情可以看【影像组学pyradiomics教程】(一)简介与安装 - 简书这个链接，把每个类别特征写的都挺详细的，要是想方便的话，可以简略的看一下我的。

shape:表示提取有关形状的特征，有2D的有3D的，默认shape表示提取的是3D的形状特征，如果数据是2D的，那么就写shape2D。shape3D可以提取17个特征，shape2D可以提取10个特征，二者好像只能选其一，想尝试的可以都写上试试，估计谁在前提取谁吧，哈哈。

firstorder:一阶特征，共19个特征。包括像素的大小呀、最大值、平均值、中位数、熵、偏度、均方根啥的。

glcm: 灰度共生矩阵，共24个特征

glrlm: 灰度级运行长度矩阵，共16个特征

glszm: 共16个特征

gldm: 灰度依赖性矩阵，共14个特征

ngtdm: 共5个特征

好了，以上已经基本上yaml文件全都讲解了一遍了，最后说一下关于总共提取的特征个数到底怎么算呢，因为毕竟全都定义在yaml文件里面，写好之后系统全都自动提取了，所以可能有些人不清楚到底是怎么提取且共提取多少个特征。好，我们假设（shape，firstorder，glcm，glrlm，glszm，ngtdm，gldm）这些类别特征功能提取100个（当前不是100，就是假设一下，其实提取的个数就是我上面说的特征个数相加，比如shape3D的就是17+19+24+16+16+14+5），然后就会对应到本文第二部分所讲解的imageType中了，它会每一个imageType属性都重新提取一次featureClass特征类别（shape，firstorder，glcm，glrlm，glszm，ngtdm，gldm）。假设我们imageType中有Original属性、Square属性以及Wavelet属性被定义了，那么Original属性会提取一次featureClass，Square会对原始图像平方后提取featureClass一次，Wavelet会对原始图像做小波变化后提取featureClass一次（其实对于Wavelet，它会提取多次，因为Wavelet属性对原始图像产生不同的小波变换后的图像，所以会提取多次，我记得好像是这样的）。所以综上，提取的个数就是上述相加就对了，需要注意的是，可能你算的结果会与系统得出的结果由些许出入，这是因为在某种变换条件下，有些特征属性可能不会被计算，其实这些都不受影响的，你想知道提取了多少个特征，直接print(xx.shape)就行了，哈哈，不用算其实。不过要是想验证的话，可以试一下我这个计算方法，只要你计算的和系统输出的差不了太多，就估计可以了。

ok，这次的分享就到这里了，因为本人最近有一个医学图像的实习工作，所以就学习了一下，学习的过程中就是对pyradiomics库定义的yaml文件有些许疑问，搜了搜，也没有人进行详细的参数讲解，我也是对照各种博客加上官方API库的讲解来慢慢学习的，然后我就总结了一下我所学习的东西，让一些小白少走一些弯路并快速的入门。好了，感谢您们百忙之中来阅读我的博客，希望我的博客可以帮助到您们，有问题可以从底下留言，互相交流，哈哈，这篇文章就写道这了，拜拜！

一定要努力啊

关注

26
点赞
踩
119

收藏

觉得还不错? 一键收藏
9
评论
pyradiomics库的配置文件yaml讲解

本片博客讲解的是pyradiomics库的配置文件有关内容。pyradiomics库是医学图像（影响组学）领域中用于对图像特征进行提取的库，其内容复杂，库形完整，可基本适用于大多数要求。具体用法可以自行百度，这里只讲解其配置文件内容。下方是该库的官网链接https://pyradiomics.readthedocs.io对于pyradiomics库自定义提取特征的方式有两种，一种是代码中自己定义，也就是想提取啥就敲对应方法的代码，这一块可以百度一下。另一种就是使用配置文件（.yaml）来自定义想要提
复制链接

扫一扫