在配置好nnUNet_raw文件夹后,开始预处理原始数据。
nnUNetv2_plan_and_preprocess命令入口在nnunetv2/experiment_planning/plan_and_preprocess_entrypoints.py文件的plan_and_preprocess_entry函数
函数主要做了三件事,生成dataset_fingerprint.json文件、生成nnUNetPlans.json文件、根据上述两个文件对数据集预处理,代码结构较清晰。
生成dataset_fingerprint.json文件
阅读plan_and_preprocess_entry函数,找到生成dataset_fingerprint.json文件的入口:
# fingerprint extraction
print("Fingerprint extraction...")
extract_fingerprints(args.d, args.fpe, args.npfp, args.verify_dataset_integrity, args.clean, args.verbose)
进入extract_fingerprints函数:
def extract_fingerprints(dataset_ids: List[int], fingerprint_extractor_class_name: str = 'DatasetFingerprintExtractor',
num_processes: int = default_num_processes, check_dataset_integrity: bool = False,
clean: bool = True, verbose: bool = True):
"""
clean = False will not actually run this. This is just a switch for use with nnUNetv2_plan_and_preprocess where
we don't want to rerun fingerprint extraction every time.
"""
fingerprint_extractor_class = recursive_find_python_class(join(nnunetv2.__path__[0], "experiment_planning"),
fingerprint_extractor_class_name,
current_module="nnunetv2.experiment_planning")
# fingerprint_extractor_class 默认是
# nnunetv2.experiment_planning.dataset_fingerprint.fingerprint_extractor.DatasetFingerprintExtractor
for d in dataset_ids:
extract_fingerprint_dataset(d, fingerprint_extractor_class, num_processes, check_dataset_integrity, clean,
verbose)
fingerprint_extractor_class类默认是DatasetFingerprintExtractor,该类的路径在上面的注释当中。进入extract_fingerprint_dataset函数:
def extract_fingerprint_dataset(dataset_id: int,
fingerprint_extractor_class: Type[
DatasetFingerprintExtractor] = DatasetFingerprintExtractor,
num_processes: int = default_num_processes, check_dataset_integrity: bool = False,
clean: bool = True, verbose: bool = True):
"""
Returns the fingerprint as a dictionary (additionally to saving it)
"""
dataset_name = convert_id_to_dataset_name(dataset_id)
print(dataset_name)
if check_dataset_integrity:
verify_dataset_integrity(join(nnUNet_raw, dataset_name), num_processes)
fpe = fingerprint_extractor_class(dataset_id, num_processes, verbose=verbose)
return fpe.run(overwrite_existing=clean)
该函数首先获取数据集名称,检查nnUNet_raw/数据集文件下的目录格式是否正确,之后实例化指纹提取类,最后运行其run函数
关于DatasetFingerprintExtractor的代码阅读见
阅读nnUNet V2代码——生成dataset_fingerprint.json-CSDN博客
生成nnUNetPlans.json文件
阅读plan_and_preprocess_entry函数,找到生成nnUNetPlans.json文件的入口:
# experiment planning
print('Experiment planning...')
plans_identifier = plan_experiments(args.d, args.pl, args.gpu_memory_target, args.preprocessor_name,
args.overwrite_target_spacing, args.overwrite_plans_name)
进入plan_experiments函数:
def plan_experiments(dataset_ids: List[int], experiment_planner_class_name: str = 'ExperimentPlanner',
gpu_memory_target_in_gb: float = None, preprocess_class_name: str = 'DefaultPreprocessor',
overwrite_target_spacing: Optional[Tuple[float, ...]] = None,
overwrite_plans_name: Optional[str] = None):
"""
overwrite_target_spacing ONLY applies to 3d_fullres and 3d_cascade fullres!
"""
if experiment_planner_class_name == 'ExperimentPlanner':
print("\n############################\n"
"INFO: You are using the old nnU-Net default planner. We have updated our recommendations. "
"Please consider using those instead! "
"Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md"
"\n############################\n")
experiment_planner = recursive_find_python_class(join(nnunetv2.__path__[0], "experiment_planning"),
experiment_planner_class_name,
current_module="nnunetv2.experiment_planning")
# experiment_planner 默认是
# nnunetv2.experiment_planning.experiment_planners.default_experiment_planner.ExperimentPlanner
print(experiment_planner)
plans_identifier = None
for d in dataset_ids:
_, plans_identifier = plan_experiment_dataset(d, experiment_planner, gpu_memory_target_in_gb,
preprocess_class_name,
overwrite_target_spacing, overwrite_plans_name)
return plans_identifier
experiment_planner类默认是ExperimentPlanner类,该类的路径在上面的注释当中。进入plan_experiment_dataset函数:
def plan_experiment_dataset(dataset_id: int,
experiment_planner_class: Type[ExperimentPlanner] = ExperimentPlanner,
gpu_memory_target_in_gb: float = None, preprocess_class_name: str = 'DefaultPreprocessor',
overwrite_target_spacing: Optional[Tuple[float, ...]] = None,
overwrite_plans_name: Optional[str] = None) -> Tuple[dict, str]:
"""
overwrite_target_spacing ONLY applies to 3d_fullres and 3d_cascade fullres!
"""
kwargs = {}
if overwrite_plans_name is not None:
kwargs['plans_name'] = overwrite_plans_name
if gpu_memory_target_in_gb is not None:
kwargs['gpu_memory_target_in_gb'] = gpu_memory_target_in_gb
planner = experiment_planner_class(dataset_id,
preprocessor_name=preprocess_class_name,
overwrite_target_spacing=[float(i) for i in overwrite_target_spacing] if
overwrite_target_spacing is not None else overwrite_target_spacing,
suppress_transpose=False, # might expose this later,
**kwargs
)
ret = planner.plan_experiment()
return ret, planner.plans_identifier
该函数先配置两个参数,然后实例化实验计划类,运行其plan_experiment函数,最后返回相关信息。
关于ExperimentPlanner代码阅读见
阅读nnUNet V2代码——生成nnUNetPlans.json—__init__函数-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—determine_fullres_target_spacing函数-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—determine_transpose函数_nnuet代码-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—static_estimate_VRAM_usage函数_unetv2代码-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—get_plans_for_configuration函数-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—determine_resampling函数-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—determine_segmentation_softmax_export_fn函数-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—determine_normalization_scheme_and_whether_mask_is_used_for_norm函数-CSDN博客
阅读nnUNet V2代码——生成nnUNetPlans.json—plan_experiment函数-CSDN博客
ExperimentPlanner类内其余函数例如load_plans、save_plans等函数没有阅读
数据预处理
入口:
if not args.no_pp:
print('Preprocessing...')
preprocess(args.d, plans_identifier, args.c, np, args.verbose)
进入preprocess函数:
def preprocess(dataset_ids: List[int],
plans_identifier: str = 'nnUNetPlans',
configurations: Union[Tuple[str], List[str]] = ('2d', '3d_fullres', '3d_lowres'),
num_processes: Union[int, Tuple[int, ...], List[int]] = (8, 4, 8),
verbose: bool = False):
for d in dataset_ids:
preprocess_dataset(d, plans_identifier, configurations, num_processes, verbose)
参数有数据集ID、nnUNetPlans、配置(2d、3d_fullres等)、进程数、是否打印详细信息的标志位。
函数内对每一个数据集预处理,进入preprocess_dataset函数:
def preprocess_dataset(dataset_id: int,
plans_identifier: str = 'nnUNetPlans',
configurations: Union[Tuple[str], List[str]] = ('2d', '3d_fullres', '3d_lowres'),
num_processes: Union[int, Tuple[int, ...], List[int]] = (8, 4, 8),
verbose: bool = False) -> None:
if not isinstance(num_processes, list):
num_processes = list(num_processes)
if len(num_processes) == 1:
num_processes = num_processes * len(configurations)
if len(num_processes) != len(configurations):
raise RuntimeError(
f'The list provided with num_processes must either have len 1 or as many elements as there are '
f'configurations (see --help). Number of configurations: {len(configurations)}, length '
f'of num_processes: '
f'{len(num_processes)}')
dataset_name = convert_id_to_dataset_name(dataset_id)
print(f'Preprocessing dataset {dataset_name}')
plans_file = join(nnUNet_preprocessed, dataset_name, plans_identifier + '.json')
plans_manager = PlansManager(plans_file)
for n, c in zip(num_processes, configurations):
print(f'Configuration: {c}...')
if c not in plans_manager.available_configurations:
print(
f"INFO: Configuration {c} not found in plans file {plans_identifier + '.json'} of "
f"dataset {dataset_name}. Skipping.")
continue
configuration_manager = plans_manager.get_configuration(c)
preprocessor = configuration_manager.preprocessor_class(verbose=verbose)
preprocessor.run(dataset_id, c, plans_identifier, num_processes=n)
# copy the gt to a folder in the nnUNet_preprocessed so that we can do validation even if the raw data is no
# longer there (useful for compute cluster where only the preprocessed data is available)
from distutils.file_util import copy_file
maybe_mkdir_p(join(nnUNet_preprocessed, dataset_name, 'gt_segmentations'))
dataset_json = load_json(join(nnUNet_raw, dataset_name, 'dataset.json'))
dataset = get_filenames_of_train_images_and_targets(join(nnUNet_raw, dataset_name), dataset_json)
# only copy files that are newer than the ones already present
for k in dataset:
copy_file(dataset[k]['label'],
join(nnUNet_preprocessed, dataset_name, 'gt_segmentations', k + dataset_json['file_ending']),
update=True)
函数首先判断部分参数是否符合要求。
之后获取数据集名称、载入nnUNetPlan.json文件(实例化PlansManager类)。
之后for循环按照每一个配置类型(2d、3d_fullres等)对数据集做不同的预处理。内部细节在后面。
循环结束后对dataset.json文件、nnUNet_raw / 数据集名称 / labelsTr内的掩码文件(具体包含哪些掩码文件见博客内的get_filenames_of_train_images_and_targets函数)复制到nnUNet_preprocessed / 数据集名称 / 文件夹内。
回到for循环内,从configuration_manager开始的三行代码涉及三个类,一个是上面提到的PlansManager类,一个是与PlansManager类处于同文件内的ConfigurationManager类,一个是DefaultPreprocessor类。
PlansManager类和ConfigurationManager类偏向于服务型,给DefaultPreprocessor类提供信息,信息内容来自于上面生成的两个json文件(先统计、后处理)。DefaultPreprocessor类进行数据的预处理,代码见上面DefaultPreprocessor的超链接
至此,nnUNetv2_plan_and_preprocess命令阅读完毕。
之后是训练部分