目录
get_filenames_of_train_images_and_targets
get_identifiers_from_splitted_dataset_folder
create_lists_from_splitted_dataset_folder
阅读nnUNet\nnunetv2\experiment_planning\dataset_fingerprint\fingerprint_extractor.py
文件内只有一个DatasetFingerprintExtractor类,阅读其初始化函数
在初始化函数内涉及的其他函数都在文章后半部分说明
__init__函数
def __init__(self, dataset_name_or_id: Union[str, int], num_processes: int = 8, verbose: bool = False):
dataset_name = maybe_convert_to_dataset_name(dataset_name_or_id)
self.verbose = verbose
self.dataset_name = dataset_name
self.input_folder = join(nnUNet_raw, dataset_name)
self.num_processes = num_processes
self.dataset_json = load_json(join(self.input_folder, 'dataset.json'))
self.dataset = get_filenames_of_train_images_and_targets(self.input_folder, self.dataset_json)
self.num_foreground_voxels_for_intensitystats = 10e7
参数
-
dataset_name_or_id:数据集名称或ID
-
num_processes:进程数
-
verbose:是否打印详细信息
部分定义的变量
-
dataset_name:数据集名称,例如Dataset001_QQQQ
-
dataset_json:数据集相关信息,在input_folder下,例如:Dataset001_QQQQ/dataset_json
-
dataset:字典类型,{"文件标识名" : { "images" : {原始图像文件路径列表}, "label" : {分割掩码文件路径列表} }}, 其中文件标识名就是nnUNet_raw/Dataset001_QQQQ下文件名中"_0000.nii.gz"前的字符串
-
num_foreground_voxels_for_intensitystats:暂时没搞懂 :-D
函数
-
get_filenames_of_train_images_and_targets:见“涉及的函数”
--------手动分割线-----------
涉及的函数
get_filenames_of_train_images_and_targets
参数
-
raw_dataset_folder:nnUNet_raw下的数据集文件夹,例如nnunet_raw/Dataset001_QQQQ
-
dataset_json:数据集文件下的dataset_json
函数
-
get_identifiers_from_splitted_dataset_folder:见“涉及的函数”
-
create_lists_from_splitted_dataset_folder:见“涉及的函数”
过程
确保dataset_json读取raw_dataset_folder下的dataset_json
if `dataset`是dataset.json文件的key,则emmm,暂时略过 :-D
else 获取raw_dataset_folder文件夹下所有文件的标识名,再获取标识名对应原始图像文件的文件路径以及分割掩码的文件路径,最后将标识名、原始图像文件路径、分割掩码文件路径打包成字典,返回
def get_filenames_of_train_images_and_targets(raw_dataset_folder: str, dataset_json: dict = None):
if dataset_json is None:
dataset_json = load_json(join(raw_dataset_folder, 'dataset.json'))
if 'dataset' in dataset_json.keys():
dataset = dataset_json['dataset']
for k in dataset.keys():
dataset[k]['label'] = os.path.abspath(join(raw_dataset_folder, dataset[k]['label'])) if not os.path.isabs(dataset[k]['label']) else dataset[k]['label']
dataset[k]['images'] = [os.path.abspath(join(raw_dataset_folder, i)) if not os.path.isabs(i) else i for i in dataset[k]['images']]
else:
identifiers = get_identifiers_from_splitted_dataset_folder(join(raw_dataset_folder, 'imagesTr'), dataset_json['file_ending'])
images = create_lists_from_splitted_dataset_folder(join(raw_dataset_folder, 'imagesTr'), dataset_json['file_ending'], identifiers)
segs = [join(raw_dataset_folder, 'labelsTr', i + dataset_json['file_ending']) for i in identifiers]
dataset = {i: {'images': im, 'label': se} for i, im, se in zip(identifiers, images, segs)}
return dataset
get_identifiers_from_splitted_dataset_folder
参数
- folder:待读取的文件夹
- file_ending:folder下文件的拓展名
过程
读取文件夹下的所有文件
去除文件名后9位,例如:QQQQQ001_0000.nii.gz --> QQQQQ001
去重后返回文件名(标识符)列表
def get_identifiers_from_splitted_dataset_folder(folder: str, file_ending: str):
files = subfiles(folder, suffix=file_ending, join=False)
# all files have a 4 digit channel index (_XXXX)
crop = len(file_ending) + 5
files = [i[:-crop] for i in files]
# only unique image ids
files = np.unique(files)
return files
create_lists_from_splitted_dataset_folder
参数
过程
确保identifiers是folder下所有文件的标识名列表
获取folder下所有文件名称,存入files列表
遍历所有标识符,如果符合“标识符+_数字数字数字数字+拓展名”的格式,就将其文件路径加入list_of_lists列表
遍历结束返回list_of_lists列表
def create_lists_from_splitted_dataset_folder(folder: str, file_ending: str, identifiers: List[str] = None) -> List[
List[str]]:
"""
does not rely on dataset.json
"""
if identifiers is None:
identifiers = get_identifiers_from_splitted_dataset_folder(folder, file_ending)
files = subfiles(folder, suffix=file_ending, join=False, sort=True)
list_of_lists = []
for f in identifiers:
p = re.compile(re.escape(f) + r"_\d\d\d\d" + re.escape(file_ending))
list_of_lists.append([join(folder, i) for i in files if p.fullmatch(i)])
return list_of_lists